Professional Machine Learning Engineer Questions

Topic 1 Question 1

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 1 discussion

You are building an ML model to detect anomalies in real-time sensor data. You will use Pub/Sub to handle incoming requests. You want to store the results for analytics and visualization. How should you configure the pipeline?

  • A. 1 = Dataflow, 2 = AI Platform, 3 = BigQuery
  • B. 1 = DataProc, 2 = AutoML, 3 = Cloud Bigtable
  • C. 1 = BigQuery, 2 = AutoML, 3 = Cloud Functions
  • D. 1 = BigQuery, 2 = AI Platform, 3 = Cloud Storage
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
esuaaaa
Highly Voted 4 years, 5 months ago
Definitely A. Dataflow is must.
upvoted 24 times
...
inder0007
Highly Voted 4 years, 5 months ago
Even if I follow the link, it should be dataflow, AI-Platform and Bigquery. Real answer should be A
upvoted 13 times
...
bitsplease
Most Recent 1 month, 3 weeks ago
Selected Answer: A
AI Platform should be called vertex AI platform dataproc serverless can be configured with spark streaming, however dataflow is ideal A is correct
upvoted 1 times
...
yia20082000
7 months, 2 weeks ago
Selected Answer: A
Dataflow required and BQ at the end
upvoted 1 times
...
AWBY_sback
8 months, 2 weeks ago
Selected Answer: A
I think it s A
upvoted 1 times
...
ki_123
11 months, 1 week ago
Selected Answer: A
BigQuery for the analytics and visualization.
upvoted 2 times
...
jkkim_jt
1 year ago
Selected Answer: A
ChatGPT Prompt: Using Google Pub/Sub Define ML Pipeline for Anomaly Detection ```mermaid graph TD; A[Data Source] -->|Pub/Sub Topic| B[Pub/Sub] B --> C[Dataflow for Preprocessing] C --> D[ML Model Inference (AI Platform)] D -->|Prediction| E[BigQuery / Cloud Storage] D -->|Alert| F[Cloud Functions] ```
upvoted 1 times
...
jkkim_jt
1 year ago
ChatGPT Prompt: Using Google Pub/Sub Define ML Pipeline for Anomaly Detection ```mermaid graph TD; A[Data Source] -->|Pub/Sub Topic| B[Pub/Sub] B --> C[Dataflow for Preprocessing] C --> D[ML Model Inference (AI Platform)] D -->|Prediction| E[BigQuery / Cloud Storage] D -->|Alert| F[Cloud Functions] ```
upvoted 1 times
...
nktyagi
1 year, 1 month ago
Selected Answer: A
To preprocess data you will use Dataflow, and then you can use the Vertex AI platform for training and serving. Since it's a recommendation use case, Cloud BigQuery is the recommended NoSQL store to manage this use case storage at scale and reduce latency.
upvoted 2 times
...
LeumaS_NoswaY
1 year, 1 month ago
PubSub -> Dataflow -> AI Platform -> BiqQuery
upvoted 1 times
...
Yorko
1 year, 4 months ago
Selected Answer: A
BigQuery for analytics 100%
upvoted 1 times
...
PhilipKoku
1 year, 5 months ago
Selected Answer: A
Big Query is ideal for analytics
upvoted 1 times
...
RPS007
1 year, 6 months ago
Selected Answer: A
Verified Answer
upvoted 1 times
...
Shreeti_Saha
1 year, 7 months ago
Option A
upvoted 1 times
...
fragkris
1 year, 11 months ago
Selected Answer: A
A - Dataflow is the only correct option for this case.
upvoted 1 times
...
RangasamyArran
2 years ago
AutoML is useful for labeled data. So either A or D. Dataflow is must for pipeline so A is correct
upvoted 1 times
...
LMDY
2 years, 1 month ago
Selected Answer: A
A. Definitely it's the correct answer
upvoted 1 times
...

Topic 1 Question 2

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 2 discussion

Your organization wants to make its internal shuttle service route more efficient. The shuttles currently stop at all pick-up points across the city every 30 minutes between 7 am and 10 am. The development team has already built an application on Google Kubernetes Engine that requires users to confirm their presence and shuttle station one day in advance. What approach should you take?

  • A. 1. Build a tree-based regression model that predicts how many passengers will be picked up at each shuttle station. 2. Dispatch an appropriately sized shuttle and provide the map with the required stops based on the prediction.
  • B. 1. Build a tree-based classification model that predicts whether the shuttle should pick up passengers at each shuttle station. 2. Dispatch an available shuttle and provide the map with the required stops based on the prediction.
  • C. 1. Define the optimal route as the shortest route that passes by all shuttle stations with confirmed attendance at the given time under capacity constraints. 2. Dispatch an appropriately sized shuttle and indicate the required stops on the map.
  • D. 1. Build a reinforcement learning model with tree-based classification models that predict the presence of passengers at shuttle stops as agents and a reward function around a distance-based metric. 2. Dispatch an appropriately sized shuttle and provide the map with the required stops based on the simulated outcome.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
nissili
Highly Voted 4 years, 4 months ago
C: for all confirmed.
upvoted 24 times
sensev
1 year, 1 month ago
I agree with this, because it mentioned that they now "require users to confirm their presence". I think this is an example of when a classical routing algorithm is a better fit compare to ML-approach.
upvoted 15 times
...
...
vamgcp
Most Recent 2 months ago
Selected Answer: D
D - D is correct- his option combines two techniques: downsampling and upweighting. While a 1:1 ratio is often the goal, a 10% positive example ratio is a significant improvement from the original less than 1% and can be a practical compromise.
upvoted 1 times
...
GalGilor
6 months, 4 weeks ago
Selected Answer: C
C: People are required to let you know ahead if they show up. You don't need to model whether people are going to be present in a station. Also, logically, the cost of not picking up people is significantly higher than the route taking a little more time. So, A and B, though possible, would not be optimal given that you already know where the people are located.
upvoted 1 times
...
JPA210
7 months, 2 weeks ago
Selected Answer: C
I agree with what is being said, that this use case is not for ML.
upvoted 1 times
...
VishnuCh
8 months ago
Selected Answer: C
This is a route optimization problem not an machine learning problem.
upvoted 1 times
...
baimus
1 year, 1 month ago
Answer is C. This is a case where machine learning would be terrible, as it would not be 100% accurate and some passengers would not get picked up. A simple algorith works better here, and the question confirms customers will be indicating when they are at the stop so no ML required.
upvoted 3 times
...
PhilipKoku
1 year, 5 months ago
Selected Answer: C
C is the option that covers the scenario.
upvoted 1 times
...
fragkris
1 year, 11 months ago
Selected Answer: C
C - Since we have the attendance list in advance. Tree-based classification, regression and reinforced learning sounds useless in this case.
upvoted 3 times
...
Sum_Sum
1 year, 12 months ago
Selected Answer: C
you do not need to predict how many people will be at each station as the requirement mentions they have to register a day in advance
upvoted 1 times
...
M25
2 years, 6 months ago
Selected Answer: C
Went with C
upvoted 1 times
...
n_shanthi
2 years, 7 months ago
I think it should be C. I can easily eliminate D, this is not a case for reinforcement learning. Moreover, it seems like a Route Optimization rather than finding out best sized shuttle as mentioned in A or whether the shuttle should stop at a point as per point B.
upvoted 1 times
...
asava
2 years, 8 months ago
Selected Answer: C
This is a route optimization problem
upvoted 1 times
...
EFIGO
2 years, 11 months ago
Selected Answer: C
No need to predict the presences since they are already confirmed, best thing we can do is optimize the route
upvoted 3 times
...
abhi0706
3 years ago
C. route more efficient is an optimization model
upvoted 1 times
...
GCP72
3 years, 2 months ago
Selected Answer: C
C is looks correct for me
upvoted 1 times
...
Dr_Ethan
3 years, 3 months ago
Confirmed C
upvoted 1 times
...
enghabeth
3 years, 3 months ago
Selected Answer: C
C. route more efficient is an optimization model
upvoted 2 times
...

Topic 1 Question 3

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 3 discussion

You were asked to investigate failures of a production line component based on sensor readings. After receiving the dataset, you discover that less than 1% of the readings are positive examples representing failure incidents. You have tried to train several classification models, but none of them converge. How should you resolve the class imbalance problem?

  • A. Use the class distribution to generate 10% positive examples.
  • B. Use a convolutional neural network with max pooling and softmax activation.
  • C. Downsample the data with upweighting to create a sample with 10% positive examples.
  • D. Remove negative examples until the numbers of positive and negative examples are equal.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
celia20200410
Highly Voted 4 years, 3 months ago
ANS: C https://developers.google.com/machine-learning/data-prep/construct/sampling-splitting/imbalanced-data#downsampling-and-upweighting - less than 1% of the readings are positive - none of them converge. Downsampling (in this context) means training on a disproportionately low subset of the majority class examples.
upvoted 35 times
mousseUwU
4 years ago
Agree, C is correct
upvoted 3 times
...
jace_dl
6 months, 4 weeks ago
doesn't deleting negative examples also downsize the training data so why not D? because the real distribution is not 1:1?
upvoted 1 times
jace_dl
6 months, 3 weeks ago
aha the problem says less than 1% are negative examples so making them 1:1 would eleminate the example cases too much i think i understood
upvoted 1 times
...
...
...
MisterHairy
Highly Voted 1 year, 1 month ago
=New Question3= You are going to train a DNN regression model with Keras APJs using this code: model - tf.keras.Sequential() model.add(tf.keras.layers.Oense( 256, use_bias-True, activation-•relu', kernel_initializer-None, kernel_regularizer-None, input_shape-(500,))) model.add(tf.keras.layers.Oropout(rate-0.25)) model.add(tf.keras.layers.Oense( 128, use_bias-True, activation-•relu', kernel_initializer-'uniform', kernel_regularizer-'12')) model.add(tf.keras.layers.Oropout(rate-0.25)) model.add(tf.keras.layers.Oense( 2, use_bias-False, activation-•softriax')) model.cornpile(loss-•mse') How many trainable weights does your model have? (The arithmetic below is correct.) A. 501*256+257*128+2 = 161154 B. 500*256+256*128+128*2 = 161024 C. 501*256+257*128+128*2 = 161408 D. 500*256*0(?)25+256*128*0(?)25+128*2 = 4044
upvoted 10 times
tooooony55
3 years, 10 months ago
B: Dense layers with 100 % trainable weigts, the dropout rate at 0.25 will randomly drop 25 % for the regularization's sake - still training for 100 % of the weights.
upvoted 1 times
suresh_vn
3 years, 2 months ago
C is correct. 2nd Layer with use_bias = True
upvoted 3 times
...
AlexZot
3 years, 9 months ago
Correct answer is C. Do not forget about bias term which is also trainable parameter.
upvoted 6 times
sakura65
3 years, 7 months ago
Why 128 for the last layer is correct and not 129 X 2?
upvoted 1 times
suresh_vn
3 years, 2 months ago
because of use_bias = False
upvoted 1 times
...
...
...
...
NickHapton
3 years, 10 months ago
Why do you post new questions in every existing question rather than post them as a new question?
upvoted 4 times
MisterHairy
3 years, 10 months ago
Only moderator can post new questions. Thus, I am left with this format. I have emailed the additional questions to the moderator, but he/she has not added them to the site. These questions were received off of other practice tests, but answers were not provided.
upvoted 4 times
...
...
MisterHairy
3 years, 10 months ago
Answer?
upvoted 1 times
...
Mohamed_Mossad
3 years, 5 months ago
D , is the only option that takes care of the dropout factor
upvoted 1 times
Mohamed_Mossad
3 years, 5 months ago
my bad , this was tricky "The Dropout Layer randomly disables neurons during training. They still are present in your model and therefore aren´t discounted from the number of parameters in your model summary." , so D is wrong , C and A takes care of the bias , but C is correct
upvoted 3 times
...
...
...
vamgcp
Most Recent 2 months ago
Selected Answer: C
C - This option combines two techniques: downsampling and upweighting. While a 1:1 ratio is often the goal, a 10% positive example ratio is a significant improvement from the original less than 1% and can be a practical compromise.
upvoted 1 times
...
YvetteTsai
3 months, 2 weeks ago
Selected Answer: C
answer is C. it can converge without discarding all useful negative samples. A is not clear about how it's generating the samples, might cause overfitting. B is not directly related to sample issue. D too extreme of undersampling
upvoted 1 times
...
JPA210
7 months, 2 weeks ago
Selected Answer: C
downsampling is the clue here https://developers.google.com/machine-learning/data-prep/construct/sampling-splitting/imbalanced-data#downsampling-and-upweighting
upvoted 1 times
...
ramen_lover
1 year, 1 month ago
The answer is C. We have to note that (1) Downsampling on major class (2) Upsampling on minor class (3) Upweighting on minor class all work for imbalanced data. However, the key assumption in the question is that "You have tried to train several classification models, but none of them converge". You are not asked to tackle imbalanced data but asked to handle the non-convergence problem (due to the limited resources or the poorness of the algorithm). In the official document, it says: "If you have an imbalanced data set, first try training on the true distribution. If the model works well and generalizes, you're done! If not, try the following downsampling and upweighting technique." https://developers.google.com/machine-learning/data-prep/construct/sampling-splitting/imbalanced-data#downsampling-and-upweighting In other words, the "downsampling and upweighting technique" is the technique for the non-convergence problem (not for for the imbalanced data).
upvoted 6 times
...
Fatiy
1 year, 1 month ago
Selected Answer: C
C. Downsample the data with upweighting to create a sample with 10% positive examples. Dealing with class imbalance can be challenging for machine learning models. One common approach to resolving the problem is to downsample the data, either by removing examples from the majority class or by oversampling the minority class. In this case, since you have very few positive examples, you would want to oversample the positive examples to create a sample that better represents the underlying distribution of the data. This could involve using upweighting, where positive examples are given a higher weight in the loss function to compensate for their relative scarcity in the data. This can help the model to better focus on the positive examples and improve its performance in classifying failure incidents.
upvoted 3 times
...
PhilipKoku
1 year, 5 months ago
Selected Answer: C
This approach involves downsampling the majority class (negative examples) and upweighting the minority class (positive examples) to create a balanced dataset. By doing so, the model can learn from both classes effectively. Reference: How to Handle Imbalanced Classes in Machine Learning [https://elitedatascience.com/imbalanced-classes]
upvoted 3 times
...
fragkris
1 year, 11 months ago
Selected Answer: C
C - Downsample the majority and add weights to it.
upvoted 2 times
...
tatpicc
1 year, 12 months ago
Max Pooling is a pooling operation that calculates the maximum value for patches of a feature map, and uses it to create a downsampled (pooled) feature map. It is usually used after a convolutional layer.
upvoted 1 times
...
M25
2 years, 6 months ago
Selected Answer: C
Went with C
upvoted 1 times
...
Puneet2022
2 years, 6 months ago
Selected Answer: C
https://developers.google.com/machine-learning/data-prep/construct/sampling-splitting/imbalanced-data#downsampling-and-upweighting
upvoted 1 times
...
enghabeth
2 years, 9 months ago
Selected Answer: C
https://developers.google.com/machine-learning/data-prep/construct/sampling-splitting/imbalanced-data
upvoted 1 times
...
Fatiy
2 years, 9 months ago
C. Downsample the data with upweighting to create a sample with 10% positive examples. Dealing with class imbalance can be challenging for machine learning models. One common approach to resolving the problem is to downsample the data, either by removing examples from the majority class or by oversampling the minority class. In this case, since you have very few positive examples, you would want to oversample the positive examples to create a sample that better represents the underlying distribution of the data. This could involve using upweighting, where positive examples are given a higher weight in the loss function to compensate for their relative scarcity in the data. This can help the model to better focus on the positive examples and improve its performance in classifying failure incidents.
upvoted 1 times
...
SharathSH
2 years, 10 months ago
Answer would obviously be C As the dataset is imbalanced and you need to resolve this issue in order to obtain desired result the best approach will be to downsample the data.
upvoted 1 times
...
EFIGO
2 years, 11 months ago
Selected Answer: C
Best practice for imbalanced dataset is to downsample with upweight https://developers.google.com/machine-learning/data-prep/construct/sampling-splitting/imbalanced-data#downsampling-and-upweighting
upvoted 1 times
...
GCP72
3 years, 2 months ago
Selected Answer: C
Correct answer is "C"
upvoted 1 times
...

Topic 1 Question 4

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 4 discussion

You want to rebuild your ML pipeline for structured data on Google Cloud. You are using PySpark to conduct data transformations at scale, but your pipelines are taking over 12 hours to run. To speed up development and pipeline run time, you want to use a serverless tool and SQL syntax. You have already moved your raw data into Cloud Storage. How should you build the pipeline on Google Cloud while meeting the speed and processing requirements?

  • A. Use Data Fusion's GUI to build the transformation pipelines, and then write the data into BigQuery.
  • B. Convert your PySpark into SparkSQL queries to transform the data, and then run your pipeline on Dataproc to write the data into BigQuery.
  • C. Ingest your data into Cloud SQL, convert your PySpark commands into SQL queries to transform the data, and then use federated queries from BigQuery for machine learning.
  • D. Ingest your data into BigQuery using BigQuery Load, convert your PySpark commands into BigQuery SQL queries to transform the data, and then write the transformations to a new table.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
nunzio144
Highly Voted 1 year, 1 month ago
It should be D .... Data Fusion is not SQL syntax ....
upvoted 23 times
A4M
3 years, 9 months ago
Needs to be D as the most suitable answer given the req's in question Datafusion is more of a no code Data transformation tool
upvoted 1 times
...
q4exam
4 years, 2 months ago
Agree, BQ is the only serverless that support SQL
upvoted 5 times
...
OpenKnowledge
2 months, 1 week ago
Cloud SQL is not serverless; you still provision and manage instances with defined compute and storage resources
upvoted 1 times
OpenKnowledge
2 months, 1 week ago
Traditional Dataproc runs on Compute Engine, which requires manual cluster management. Google Cloud offers a serverless Dataproc offering, now called Google Cloud Serverless for Apache Spark, which does not need clusters or infrastructure management
upvoted 1 times
...
...
...
Celia20210714
Highly Voted 4 years, 3 months ago
ANS: A https://cloud.google.com/data-fusion#section-1 - Data Fusion is a serverless approach leveraging the scalability and reliability of Google services like Dataproc means Data Fusion offers the best of data integration capabilities with a lower total cost of ownership. - BigQuery is serverless and supports SQL. - Dataproc is not serverless, you have to manage clusters. - Cloud SQL is not serverless, you have to manage instances.
upvoted 12 times
q4exam
4 years, 2 months ago
Data Fusion is not serverless, it create dataproc to execute the job .... I think the answer is C
upvoted 1 times
mousseUwU
4 years ago
Data Fusion is serverless: https://cloud.google.com/data-fusion#all-features
upvoted 3 times
tavva_prudhvi
2 years, 8 months ago
I think you're only viewing the sentence "A serverless approach leveraging the scalability and reliability of Google services like Dataproc means Data Fusion offers the best of data integration capabilities with a lower total cost of ownership", The sentence implies that Data Fusion leverages a serverless approach, but it does not explicitly state that Data Fusion itself is serverless. It states that Data Fusion offers the best of data integration capabilities by using a serverless approach that leverages the scalability and reliability of Google services like Dataproc. So, while Data Fusion may not be fully serverless, it is designed to take advantage of serverless capabilities through its integration with Google services.
upvoted 2 times
...
...
...
mousseUwU
4 years ago
Agree, A is correct
upvoted 2 times
...
TornikePirveli
1 year, 2 months ago
By your logic it should be D, because BQ is fully serverless and supports SQL
upvoted 1 times
...
...
Rafa1312
Most Recent 1 month, 3 weeks ago
Selected Answer: D
It should be D a- talks about GUI, but they want serverless b- data proc is not serverless c. cloud sql - too slow d - correct answer
upvoted 1 times
...
Moulichintakunta
5 months ago
Selected Answer: D
Answer is D : Hints: SQL, Serverless, less time Ans: BigQuery
upvoted 3 times
...
danvic
5 months, 1 week ago
Selected Answer: B
I think the answer is B. Dataproc is a tool used for managing Spark clusters. If there are many files in Cloud Storage loading them one by one to BigQuery might be tedious. To save time we would like also to avoid translating the pipeline into another language.
upvoted 1 times
...
manualrg
10 months, 1 week ago
Selected Answer: B
Both B and D are vaild IMHO (SQL + Severless https://cloud.google.com/dataproc-serverless/docs/overview), but the book Official Google Cloud Certified Professional Machine Learning Engineer Study Guide says B (I think there several errors in the book)
upvoted 2 times
...
joqu
11 months, 4 weeks ago
Selected Answer: D
People giving other answers are to hang up on the fact that it currently runs in PySpark. The data is in GCS, you want quick serverless solution and use SQL syntax - BigQuery is the only good option that "meets the speed and processing requirements".
upvoted 3 times
...
LeumaS_NoswaY
1 year, 1 month ago
B. You need Cloud Dataproc to transform the data from PySpark to Spark SQL
upvoted 1 times
...
TornikePirveli
1 year, 2 months ago
Serverless, SQL syntax -> BigQuery, simple as that
upvoted 3 times
...
jsalvasoler
1 year, 3 months ago
I am very curious. Why are the solutions (when I click Reveal Solution) generally WRONG?
upvoted 2 times
...
tadeupan
1 year, 3 months ago
option D because needs a serveless solution and sql sintax and BigQuery offer this. Datarproc is not serverless, so B is incorrect, D is correct option.
upvoted 3 times
...
Yorko
1 year, 4 months ago
Selected Answer: D
There's an updated version of this question in the official Google Cloud certified PMLE study guide. Option D is marked as correct
upvoted 2 times
TornikePirveli
1 year, 2 months ago
Can you link the updated version? On Amazon it's still 1st version and marked B
upvoted 1 times
...
...
PhilipKoku
1 year, 5 months ago
Selected Answer: D
The best approach is option D: Ingest data into BigQuery and use SQL queries for transformations. This leverages BigQuery’s serverless capabilities, efficient processing, and seamless integration with other Google Cloud services.
upvoted 2 times
...
fragkris
1 year, 11 months ago
Selected Answer: D
D - BigQuery is the only serverless and SQL-syntax option.
upvoted 1 times
...
Sum_Sum
1 year, 12 months ago
Selected Answer: D
D - as BQ is server less and supports SQL none of the other options match both criteria
upvoted 2 times
...
12112
2 years, 4 months ago
Selected Answer: D
I'll go with D.
upvoted 1 times
...
M25
2 years, 6 months ago
Selected Answer: D
Went with D
upvoted 3 times
...

Topic 1 Question 5

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 5 discussion

You manage a team of data scientists who use a cloud-based backend system to submit training jobs. This system has become very difficult to administer, and you want to use a managed service instead. The data scientists you work with use many different frameworks, including Keras, PyTorch, theano, Scikit-learn, and custom libraries. What should you do?

  • A. Use the AI Platform custom containers feature to receive training jobs using any framework.
  • B. Configure Kubeflow to run on Google Kubernetes Engine and receive training jobs through TF Job.
  • C. Create a library of VM images on Compute Engine, and publish these images on a centralized repository.
  • D. Set up Slurm workload manager to receive jobs that can be scheduled to run on your cloud infrastructure.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
gcp2021go
Highly Voted 4 years, 5 months ago
the answer is A
upvoted 25 times
...
guruguru
Highly Voted 4 years, 3 months ago
A, because AI platform supported all the frameworks mentioned. And Kubeflow is not managed service in GCP. https://cloud.google.com/ai-platform/training/docs/getting-started-pytorch
upvoted 11 times
...
jkkim_jt
Most Recent 1 year ago
I think "custom container" should be written in capital letters like "AI Platform Custom Container". It is one of the features of AI Platform. It is a proper noun not a general term.
upvoted 3 times
...
Yorko
1 year, 4 months ago
Selected Answer: A
A. Now it's called Vertex AI
upvoted 5 times
...
PhilipKoku
1 year, 5 months ago
Selected Answer: A
The best approach is option A: Use AI Platform custom containers. It provides flexibility, scalability, and support for various frameworks, making it an ideal choice for your team’s needs.
upvoted 1 times
...
fragkris
1 year, 11 months ago
Selected Answer: A
Chose A
upvoted 1 times
...
Sum_Sum
1 year, 12 months ago
Selected Answer: A
A is the only Google managed service solution B,C - are not managed D- is a 3rd party
upvoted 2 times
...
M25
2 years, 6 months ago
Selected Answer: A
Went with A
upvoted 2 times
...
Antmal
2 years, 10 months ago
The answer must be D as nowhere in the question has GCP been mention https://aadityachapagain.com/2020/09/distributed-training-with-slurm-on-gcp/
upvoted 1 times
tavva_prudhvi
2 years, 8 months ago
D is incorrect, this is more far from a managed service based solution.
upvoted 2 times
...
...
Antmal
2 years, 10 months ago
The Answer is D. As no where in the answer has GCP been mentioned. https://aadityachapagain.com/2020/09/distributed-training-with-slurm-on-gcp/
upvoted 1 times
...
ares81
2 years, 10 months ago
Selected Answer: A
It's A
upvoted 2 times
...
Moulichintakunta
2 years, 11 months ago
Selected Answer: D
Here the question is on workload management not on supporting frameworks slurm is a managed solution for workloads
upvoted 1 times
...
EFIGO
2 years, 11 months ago
Selected Answer: A
Now it's Vertex AI (instead of AI Platform), but it's the best solution, no need to do anything more complicated
upvoted 4 times
...
abhi0706
3 years ago
A - Vertex AI now
upvoted 3 times
...
GCP72
3 years, 2 months ago
Selected Answer: A
Correct answer is "A"
upvoted 1 times
...
caohieu04
3 years, 8 months ago
Selected Answer: A
A is correct
upvoted 2 times
...
vinit1101
3 years, 9 months ago
Selected Answer: A
the answer is A
upvoted 2 times
...

Topic 1 Question 6

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 6 discussion

You work for an online retail company that is creating a visual search engine. You have set up an end-to-end ML pipeline on Google Cloud to classify whether an image contains your company's product. Expecting the release of new products in the near future, you configured a retraining functionality in the pipeline so that new data can be fed into your ML models. You also want to use AI Platform's continuous evaluation service to ensure that the models have high accuracy on your test dataset. What should you do?

  • A. Keep the original test dataset unchanged even if newer products are incorporated into retraining.
  • B. Extend your test dataset with images of the newer products when they are introduced to retraining.
  • C. Replace your test dataset with images of the newer products when they are introduced to retraining.
  • D. Update your test dataset with images of the newer products when your evaluation metrics drop below a pre-decided threshold.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
esuaaaa
Highly Voted 4 years, 5 months ago
I think B is the right answer. A: Doesn't make sense. If you don't use the new product, it becomes useless. C: Conventional products are also necessary as data. D: I don't understand the need to wait until the threshold is exceeded.
upvoted 35 times
mousseUwU
4 years ago
Agree with you, B is correct
upvoted 1 times
...
q4exam
4 years, 2 months ago
Agree, B as it extends to new products.
upvoted 1 times
...
VincenzoP84
2 years, 6 months ago
D could have sense considering that is mentioned the intention to use AI Platform's continuous evaluation service
upvoted 2 times
...
maukaba
1 year, 12 months ago
it's D for two reasons: - explicitly required in the question to leverage Continuous evaluation service - the threshod check allows to decide when perform the retrain avoiding making it for every single new data arrived.
upvoted 4 times
...
...
gcp2021go
Highly Voted 4 years, 5 months ago
answer is B
upvoted 11 times
...
bc3f222
Most Recent 8 months ago
Selected Answer: B
because other options are wrong
upvoted 1 times
...
joqu
11 months, 3 weeks ago
Selected Answer: D
Task: "You also want to use AI Platform's continuous evaluation service to ensure that the models have high accuracy on your test dataset." Docs: https://cloud.google.com/vertex-ai/docs/evaluation/introduction "After your model is deployed to production, periodically evaluate your model with new incoming data. If the evaluation metrics show that your model performance is degrading, consider re-training your model. This process is called continuous evaluation."
upvoted 1 times
...
503b759
1 year ago
D: Its definitely not a clear choice. B is the most obvious answer - you know you've got new data coming in, so why not incorporate it immediately into training. EXCEPT the question clearly states that Vertex continual evaluation should feature.
upvoted 1 times
Franui
11 months, 1 week ago
I dont see why adding data before would mean that continuos evaluation is not featured; you would use it anyways
upvoted 1 times
...
...
MisterHairy
1 year, 1 month ago
=New Question6= You work for a global footwear retailer and need to predict when an item will be out of stock based on historical inventory dat a. Customer behavior is highly dynamic since footwear demand is influenced by many different factors. You want to serve models that are trained on all available data, but track your performance on specific subsets of data before pushing to production. What is the most streamlined and reliable way to perform this validation? A. Use the TFX Mode!Validator tools to specify performance metrics for production readiness B. Use k-fold cross-validation as a validation strategy to ensure that your model is ready for production. C. Use the last relevant week of data as a validation set to ensure that your model is performing accurately on current data. D. Use the entire dataset and treat the area under the receiver operating characteristics curve (AUC ROC) as the main metric.
upvoted 4 times
wences
3 years, 9 months ago
A is the correct
upvoted 6 times
...
sid515
3 years, 10 months ago
B looks to be ok as using cross validation testing results are more even
upvoted 2 times
...
MisterHairy
3 years, 10 months ago
Answer?
upvoted 1 times
...
Magda123212321
3 years, 9 months ago
I think C B is wrong if we train on all data https://medium.com/@soumyachess1496/cross-validation-in-time-series-566ae4981ce4 when testing on time series we generally use newest data so C
upvoted 3 times
...
...
harithacML
1 year, 1 month ago
Selected Answer: B
A. Keep the original test dataset unchanged even if newer products are incorporated into retraining. : This would not test on new products. B. Extend your test dataset with images of the newer products when they are introduced to retraining. Most Voted : old+new products testing. Great C. Replace your test dataset with images of the newer products when they are introduced to retraining. : No need of old product to be tested? old product recognition might change when new products are added in training. Option Not good. D. Update your test dataset with images of the newer products when your evaluation metrics drop below a pre-decided threshold.: why wait? no need
upvoted 1 times
...
EFIGO
1 year, 1 month ago
Selected Answer: B
You need to correctly classify newer products, so you need the new training data ==> A is wrong; You need to keep doing a good job on older dataset, you can't just ignore it ==> C is wrong; You know when you are introducing new products, there is no need to wait for a drop in preformaces ==> D is wrong; B is correct
upvoted 2 times
...
oddsoul
1 year, 1 month ago
Selected Answer: B
B correct
upvoted 1 times
...
PhilipKoku
1 year, 5 months ago
Selected Answer: B
The best approach is option B: Extend your test dataset with images of the newer products. This ensures accurate evaluation as your product catalog evolves.
upvoted 1 times
...
guilhermebutzke
1 year, 9 months ago
Selected Answer: B
My initial confusion with option B arose from the phrase "with images of the newer products when they are introduced to retraining." Initially, I mistakenly interpreted it as recommending the use of the same images in both training and testing, which is incorrect. However, upon further reflection, I realized that using the same product does not necessarily mean using identical images. Therefore, I now believe that option B is the most suitable choice.
upvoted 1 times
...
bugger123
1 year, 11 months ago
Selected Answer: B
A and C make no sense - you don't want to lose any of the performance on existing products. D - Why would you wait for your performance to drop in the first place? That's a reactive rather than proactive approach. The answer is B
upvoted 1 times
...
fragkris
1 year, 11 months ago
Selected Answer: B
B for sure
upvoted 1 times
...
Sum_Sum
1 year, 12 months ago
B is the only thing we do in practice
upvoted 1 times
...
M25
2 years, 6 months ago
Selected Answer: B
Went with B
upvoted 2 times
...
will7722
2 years, 8 months ago
Selected Answer: B
you can't just replace the old product data with just new product, until you don't sell old product anymore
upvoted 2 times
...
SharathSH
2 years, 10 months ago
Ans: B A would not use the newer data hence not a ideal option C Replacing will not be a good option as it will replace older data with newer data which in turn hampers accuracy D waiting for threshold is not a better option
upvoted 1 times
...

Topic 1 Question 7

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 7 discussion

You need to build classification workflows over several structured datasets currently stored in BigQuery. Because you will be performing the classification several times, you want to complete the following steps without writing code: exploratory data analysis, feature selection, model building, training, and hyperparameter tuning and serving. What should you do?

  • A. Configure AutoML Tables to perform the classification task.
  • B. Run a BigQuery ML task to perform logistic regression for the classification.
  • C. Use AI Platform Notebooks to run the classification model with pandas library.
  • D. Use AI Platform to run the classification model job configured for hyperparameter tuning.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
guruguru
Highly Voted 4 years, 3 months ago
A. Because BigQuery ML need to write code.
upvoted 29 times
...
gvk1
Most Recent 7 months ago
Selected Answer: B
Data is inside BQML and BQ supports all ML jobs except hypertuning. as custom training and custom data is not requirement , BQML queries can manage job.
upvoted 1 times
...
plumbig11
10 months ago
Selected Answer: A
Automl is now Vertex AI Tabular Workflows
upvoted 1 times
...
tadeupan
1 year, 3 months ago
create a model without doing literally anything, logo AutoML. A.
upvoted 1 times
...
PhilipKoku
1 year, 5 months ago
Selected Answer: A
A) Auto ML Tables doesn’t require code.
upvoted 2 times
...
Azhar10
1 year, 7 months ago
The question says 'over several structured datasets' means large/multiple datasets and 'several times' means frequently use of data. Though BigQuery ML is not an absolute 'NO Code' solution but all it needs is very simple SQL query to train ML model So 'B' could be the correct answer here but it is asking for Hyperparameter tuning which is not available in BigQuery ML so correct answer is 'A'
upvoted 3 times
...
fragkris
1 year, 11 months ago
Selected Answer: A
A - AutoML is no code
upvoted 1 times
...
harithacML
2 years, 4 months ago
Selected Answer: A
requirement : No code A. Configure AutoML Tables to perform the classification task. : No code B. Run a BigQuery ML task to perform logistic regression for the classification. : coding LR model C. Use AI Platform Notebooks to run the classification model with pandas library. : Notebooks include codes D. Use AI Platform to run the classification model job configured for hyperparameter tuning.: job needs to be written what to execute
upvoted 1 times
...
M25
2 years, 6 months ago
Selected Answer: A
Went with A
upvoted 1 times
...
Moulichintakunta
2 years, 11 months ago
Selected Answer: A
Because BigQueryML doesn't have lots of steps that mentioned in question
upvoted 1 times
...
EFIGO
2 years, 11 months ago
Selected Answer: A
"without writing code" ==> AutoML A is correct
upvoted 1 times
...
abhi0706
3 years ago
Correct answer is "A"
upvoted 1 times
...
GCP72
3 years, 2 months ago
Selected Answer: A
Correct answer is "A"
upvoted 1 times
...
sachinxshrivastav
3 years, 3 months ago
Selected Answer: A
Because BigQuery ML need to write code, so A is the correct one
upvoted 1 times
...
Mohamed_Mossad
3 years, 5 months ago
Selected Answer: A
"without writing code" only A option complies with this statment , all other options requires writing code
upvoted 1 times
...
caohieu04
3 years, 8 months ago
Selected Answer: A
A is correct
upvoted 2 times
...
NamitSehgal
3 years, 10 months ago
A is correct https://cloud.google.com/automl-tables/docs/beginners-guide
upvoted 3 times
...

Topic 1 Question 8

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 8 discussion

You work for a public transportation company and need to build a model to estimate delay times for multiple transportation routes. Predictions are served directly to users in an app in real time. Because different seasons and population increases impact the data relevance, you will retrain the model every month. You want to follow Google-recommended best practices. How should you configure the end-to-end architecture of the predictive model?

  • A. Configure Kubeflow Pipelines to schedule your multi-step workflow from training to deploying your model.
  • B. Use a model trained and deployed on BigQuery ML, and trigger retraining with the scheduled query feature in BigQuery.
  • C. Write a Cloud Functions script that launches a training and deploying job on AI Platform that is triggered by Cloud Scheduler.
  • D. Use Cloud Composer to programmatically schedule a Dataflow job that executes the workflow from training to deploying your model.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Paul_Dirac
Highly Voted 4 years, 4 months ago
Answer: A A. Kubeflow Pipelines can form an end-to-end architecture (https://www.kubeflow.org/docs/components/pipelines/overview/pipelines-overview/) and deploy models. B. BigQuery ML can't offer an end-to-end architecture because it must use another tool, like AI Platform, for serving models at the end of the process (https://cloud.google.com/bigquery-ml/docs/export-model-tutorial#online_deployment_and_serving). C. Cloud Scheduler can trigger the first step in a pipeline, but then some orchestrator is needed to continue the remaining steps. Besides, having Cloud Scheduler alone can't ensure failure handling during pipeline execution. D. A Dataflow job can't deploy models, it must use AI Platform at the end instead.
upvoted 43 times
mousseUwU
4 years ago
I guess it's A
upvoted 3 times
...
q4exam
4 years, 2 months ago
Dataflow can deploy model .... this is how you do stream inference on stream
upvoted 2 times
mousseUwU
4 years ago
Please send a source link?
upvoted 1 times
...
lordcenzin
3 years, 8 months ago
yes you can but it is not supposed to do that. DF is for data processing and transformation. you would loose all shenanigans kubeflow provide as native. Among the two answers, i think A is the most correct
upvoted 4 times
...
...
...
gcp2021go
Highly Voted 4 years, 5 months ago
the answer is D. found similar explaination in this course. open for discussion. I found B could also work, but the question asked for end-to end, thus I choose D in stead of B https://www.coursera.org/lecture/ml-pipelines-google-cloud/what-is-cloud-composer-CuXTQ
upvoted 11 times
tavva_prudhvi
2 years, 8 months ago
D is incorrect. Cloud Composer is a fully managed workflow orchestration service built on Apache Airflow. It is a recommended way by Google to schedule continuous training jobs. But it isn’t used to run the training jobs. AI Platform is used for training and deployment.
upvoted 2 times
...
...
harithacML
Most Recent 1 year, 1 month ago
Selected Answer: A
Req: retrain the model every month+ Google-recommended best practice+ end-to-end architecture A. Configure Kubeflow Pipelines to schedule your multi-step workflow from training to deploying your model. : Supports all above B. Use a model trained and deployed on BigQuery ML, and trigger retraining with the scheduled query feature in BigQuery : Why BigQuery ML when vertexAI/kubflow can handle end to end. BigQuery ML+ traigger only initiate the code run. C. Write a Cloud Functions script that launches a training and deploying job on AI Platform that is triggered by Cloud Scheduler. : Not recommended by google for end to end ML D. Use Cloud Composer to programmatically schedule a Dataflow job that executes the workflow from training to deploying your model. : Not recommended by google for end to end ML. what if model fails? matrix monitor?
upvoted 1 times
...
PhilipKoku
1 year, 5 months ago
Selected Answer: A
A) Kubeflow Pipelines is the answer.
upvoted 1 times
...
fragkris
1 year, 11 months ago
Selected Answer: A
Chose A
upvoted 1 times
...
Sum_Sum
1 year, 12 months ago
Selected Answer: A
D - Dataflow job can't deploy models B,C are not - are not complete solutions leaving A to be the correct one
upvoted 1 times
...
suranga4
2 years, 1 month ago
Answer is A
upvoted 1 times
...
M25
2 years, 6 months ago
Selected Answer: A
Went with A
upvoted 1 times
...
John_Pongthorn
2 years, 8 months ago
Selected Answer: A
A : Yet the newer is Vertext-AI Pipeline built on Kubeflow
upvoted 2 times
...
Fatiy
2 years, 9 months ago
Selected Answer: A
A : In this case, it would be a good fit as you need to retrain your model every month, which can be automated with Kubeflow Pipelines. This makes it easier to manage the entire process, from training to deploying, in a streamlined and scalable manner.
upvoted 1 times
...
EFIGO
2 years, 11 months ago
Selected Answer: A
A is correct All the options get you to the required result, but only A follows the Google-recommended best practices
upvoted 1 times
...
abhi0706
3 years ago
Answer is A: Kubeflow Pipelines can form an end-to-end architecture
upvoted 1 times
...
GCP72
3 years, 2 months ago
Selected Answer: A
Correct answer is "A"
upvoted 1 times
...
caohieu04
3 years, 8 months ago
Selected Answer: A
Community vote
upvoted 2 times
...
lordcenzin
3 years, 8 months ago
Selected Answer: A
A for me too. KF provides all the end2end tools to perform what is asked
upvoted 2 times
...
gcper
4 years, 2 months ago
A Kubeflow can handle all of those things, including deploying to a model endpoint for real-time serving.
upvoted 2 times
...
celia20200410
4 years, 3 months ago
ANS: A https://medium.com/google-cloud/how-to-build-an-end-to-end-propensity-to-purchase-solution-using-bigquery-ml-and-kubeflow-pipelines-cd4161f734d9#75c7 To automate this model-building process, you will orchestrate the pipeline using Kubeflow Pipelines, ‘a platform for building and deploying portable, scalable machine learning (ML) workflows based on Docker containers.’
upvoted 6 times
q4exam
4 years, 2 months ago
I think both A and D are correct because it is just different fashion of doing ML ...
upvoted 1 times
ms_lemon
4 years, 1 month ago
But D doesn't follow Google best practices
upvoted 2 times
george_ognyanov
4 years ago
Answer seems to be A really. Here is a link from Google-recommended best practices. They are talking about Vertex AI Pipelines, which are essentially Kubeflow. https://cloud.google.com/architecture/ml-on-gcp-best-practices?hl=en#machine-learning-workflow-orchestration
upvoted 3 times
...
...
...
...

Topic 1 Question 9

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 9 discussion

You are developing ML models with AI Platform for image segmentation on CT scans. You frequently update your model architectures based on the newest available research papers, and have to rerun training on the same dataset to benchmark their performance. You want to minimize computation costs and manual intervention while having version control for your code. What should you do?

  • A. Use Cloud Functions to identify changes to your code in Cloud Storage and trigger a retraining job.
  • B. Use the gcloud command-line tool to submit training jobs on AI Platform when you update your code.
  • C. Use Cloud Build linked with Cloud Source Repositories to trigger retraining when new code is pushed to the repository.
  • D. Create an automated workflow in Cloud Composer that runs daily and looks for changes in code in Cloud Storage using a sensor.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
celia20200410
Highly Voted 4 years, 3 months ago
ANS:C CI/CD for Kubeflow pipelines. At the heart of this architecture is Cloud Build, infrastructure. Cloud Build can import source from Cloud Source Repositories, GitHub, or Bitbucket, and then execute a build to your specifications, and produce artifacts such as Docker containers or Python tar files.
upvoted 29 times
q4exam
4 years, 2 months ago
I think B might be make sense if they have compute concern, there might be many version change but not all that you want to trigger compute
upvoted 3 times
...
...
chohan
Highly Voted 4 years, 4 months ago
Should be C https://cloud.google.com/architecture/architecture-for-mlops-using-tfx-kubeflow-pipelines-and-cloud-build#cicd_architecture
upvoted 10 times
...
harithacML
Most Recent 1 year, 1 month ago
Selected Answer: C
Req :frequently rerun training + minimise computation costs + 0 manual intervention + version control for your code A. Use Cloud Functions to identify changes to your code in Cloud Storage and trigger a retraining job. : No version control B. Use the gcloud command-line tool to submit training jobs on AI Platform when you update your code. : Needs manual intervention to gcloud cli code submission C. Use Cloud Build linked with Cloud Source Repositories to trigger retraining when new code is pushed to the repository. Yes, connects to github like Vcontrols, automated=0 manual intervention + can initiate upon code changes + cost(not sure compared to other options) D. Create an automated workflow in Cloud Composer that runs daily and looks for changes in code in Cloud Storage using a sensor. : Sensor?? too much . also none of req meets.
upvoted 2 times
...
PhilipKoku
1 year, 5 months ago
Selected Answer: C
C) It is the only answer with version control.
upvoted 2 times
...
HaiMinhNguyen
1 year, 9 months ago
I mean C is indeed the most logical, but i do not see anything relevant to cost concern. Anyone has any explanation?
upvoted 1 times
...
M25
2 years, 6 months ago
Selected Answer: C
Went with C
upvoted 1 times
...
fredcaram
2 years, 7 months ago
Selected Answer: C
C follows a best practice, B is a manual step
upvoted 1 times
...
EFIGO
2 years, 11 months ago
Selected Answer: C
C is the correct answer, it's the Google recommended approach; Checking for changes in code without using Cloud Source Repository is a bad choice, so no A and B; Cloud Composer is an overkill, so no D.
upvoted 1 times
...
abhi0706
3 years ago
Answer is C
upvoted 1 times
...
GCP72
3 years, 2 months ago
Selected Answer: C
Correct answer is "C"
upvoted 2 times
...
Mohamed_Mossad
3 years, 5 months ago
C is the best answer because "having version control for your code"
upvoted 1 times
...
caohieu04
3 years, 8 months ago
Selected Answer: C
Community vote
upvoted 3 times
...
NamitSehgal
3 years, 10 months ago
C cloudbuild
upvoted 1 times
...
ashii007
3 years, 11 months ago
B is definitely wrong because it will require manual intervention.Question specifically states the objective of minimal manual intervention. C is the way to go.
upvoted 1 times
...
alphard
3 years, 11 months ago
My answer is C. Ci/CD/CT is executed in Cloud Build.
upvoted 1 times
...
mousseUwU
4 years ago
C is correct
upvoted 1 times
...
gcper
4 years, 2 months ago
C Cloud Build + Source Repository triggers for CI/CD
upvoted 2 times
...

Topic 1 Question 10

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 10 discussion

Your team needs to build a model that predicts whether images contain a driver's license, passport, or credit card. The data engineering team already built the pipeline and generated a dataset composed of 10,000 images with driver's licenses, 1,000 images with passports, and 1,000 images with credit cards. You now have to train a model with the following label map: [`˜drivers_license', `˜passport', `˜credit_card']. Which loss function should you use?

  • A. Categorical hinge
  • B. Binary cross-entropy
  • C. Categorical cross-entropy
  • D. Sparse categorical cross-entropy
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
ransev
Highly Voted 4 years, 4 months ago
Answer is C
upvoted 23 times
gcp2021go
4 years, 4 months ago
Use sparse categorical crossentropy when your classes are mutually exclusive (e.g. when each sample belongs exactly to one class) and categorical crossentropy when one sample can have multiple classes or labels are soft probabilities (like [0.5, 0.3, 0.2]).
upvoted 9 times
GogoG
4 years ago
Definitely C - the target variable label formulated in the question requires a categorical cross entropy loss function i.e. 3 columns 'drivers_license' , 'passport', 'credit_card' that can take values 1, 0. Meanwhile sparse categorical cross entropy would require the labels to be integer encoded in a single vector, for example, 'drivers_license' = 1, 'passport' = 2, 'credit_card' = 3.
upvoted 8 times
Jarek7
2 years, 4 months ago
Actually it is exactly the opposite. Your label map has 3 options which are mutually exclusive. A document cannot be both - a driver license and a passport. There is a SPARSE vector as output - only one of the categorical outputs is valid for a one example.
upvoted 1 times
Jarek7
2 years, 4 months ago
No, I'm sorry, I wrote it before checking - You were right. We use sparse categorical cross entropy when we have just an index (integer) as a label. The only difference is that it decodes the integer into one hot representation that suites to out DNN output.
upvoted 1 times
...
...
desertlotus1211
10 months, 4 weeks ago
wrong - its; [0,1,2]
upvoted 1 times
...
...
...
OpenKnowledge
2 months, 1 week ago
Answer is C. Binary cross entropy is for binary classification but categorical cross entropy is for multi class classification, but both works for binary classification. Both Categorical and Sparse Categorical Cross-entropy are equally effective for multi-class classification. The only real difference lies in the label format. Use Categorical Cross-entropy when the labels are already one-hot encoded. Use Sparse Categorical Cross-entropy when labels are integers. Sparse Categorical Cross-entropy provides efficient, faster training and better memory usage then Categorical Cross-entropy. Categorical Cross-entropy is suitable for smaller datasets with manageable class counts. Sparse Categorical Cross-entropy is ideal for raw or large datasets with many classes.
upvoted 1 times
...
...
gcp2021go
Highly Voted 4 years, 5 months ago
answer is D https://machinelearningmastery.com/how-to-choose-loss-functions-when-training-deep-learning-neural-networks/
upvoted 10 times
giaZ
3 years, 7 months ago
Literally from the link you posted: "A possible cause of frustration when using cross-entropy with classification problems with a large number of labels is the one hot encoding process. [...] This can mean that the target element of each training example may require a one hot encoded vector with tens or hundreds of thousands of zero values, requiring significant memory. Sparse cross-entropy addresses this by performing the same cross-entropy calculation of error, without requiring that the target variable be one hot encoded prior to training". Here we have 3 categories...No problem doing one-hot encoding. Answer: C
upvoted 2 times
...
ori5225
4 years, 3 months ago
Use sparse categorical crossentropy when your classes are mutually exclusive (e.g. when each sample belongs exactly to one class) and categorical crossentropy when one sample can have multiple classes or labels are soft probabilities (like [0.5, 0.3, 0.2]).
upvoted 3 times
...
...
billyst41
Most Recent 2 months ago
Selected Answer: D
I can see how C would work, but D is more efficient, and these are mutually exclusive. I haven't taken my test yet, so any advice is appreciated.
upvoted 1 times
...
nick987654
4 months, 2 weeks ago
Selected Answer: D
Answer is D. The official Professional MLE Study Guide by Mona Mona has a similar question in Chapter 7 Model Building (Page 138) and their explanation in the answers section is "In case of multiclass classification problems, we use sparse categorical cross-entropy". Their reference material in the book is page 128 "Use sparse categorical cross-entropy when your classes are mutually exclusive (when each sample belongs exactly to one class) and categorical cross-entropy when one sample can have multiple classes or labels".
upvoted 2 times
...
3cc17c7
4 months, 3 weeks ago
Selected Answer: D
'driver licence' may be more common than others, which can cause the model to more easily predict the common classes while inaccurately predicting the uncommon classes.
upvoted 1 times
...
Googlegeek
5 months, 2 weeks ago
Selected Answer: C
This is also for multi-class classification, but it's used when your labels are integer encoded (e.g., 0 for driver's license, 1 for passport, 2 for credit card) rather than one-hot encoded. While you could use this if you converted your labels, given the explicit label map as a list, categorical cross-entropy assumes a one-hot encoding which is common practice.
upvoted 1 times
...
gvk1
7 months ago
Selected Answer: C
OHE needed here as values can be 001, 010, 100 as or condition used for image.
upvoted 1 times
...
shahriar096
7 months, 1 week ago
Selected Answer: C
It should be categorical cross entropy
upvoted 1 times
...
ddeveloperr
9 months, 1 week ago
Selected Answer: D
Since the problem is a multi-class classification task (choosing between drivers_license, passport, and credit_card), you need a loss function designed for multi-class classification. Sparse Categorical Cross-Entropy is the best choice because: The labels are integer-encoded (not one-hot encoded). It is computationally more efficient than Categorical Cross-Entropy when dealing with class indices instead of one-hot vectors. Why not the others? A (Categorical Hinge): Used for multi-class classification with hinge loss, typically for SVMs, not neural networks. B (Binary Cross-Entropy): Used for binary classification (two classes), while this problem has three classes. C (Categorical Cross-Entropy): Works for multi-class classification but requires one-hot encoded labels, whereas the dataset likely uses integer labels.
upvoted 1 times
...
vishalzade29
9 months, 1 week ago
Selected Answer: C
Categorical cross-entropy is suitable for multi-class classification problems where each instance belongs to one and only one class. Since you have multiple classes (driver's license, passport, credit card), this loss function is appropriate.
upvoted 1 times
...
arjun2025
9 months, 1 week ago
Selected Answer: D
In case of multiclass classification problems, we use sparse categorical cross‐entropy.
upvoted 1 times
...
strafer
9 months, 2 weeks ago
Selected Answer: C
Because you have a multi-class classification problem with mutually exclusive classes and a label map, categorical cross-entropy is the most suitable and commonly used loss function.
upvoted 1 times
...
moammary
9 months, 3 weeks ago
Selected Answer: C
The answer is C. No need to overthink it as sparse categorical cross entropy is used for sparse matrix which is not the case.
upvoted 1 times
...
kongae
10 months, 2 weeks ago
Selected Answer: C
Answer will be D if the label values are integer but it is string, I will go for C
upvoted 1 times
...
desertlotus1211
10 months, 4 weeks ago
Selected Answer: D
The label map [driver's_license, passport, credit_card] naturally maps to 0, 1, 2 as per machine learning standards. Which is used in Sparse categorical cross-entropy
upvoted 1 times
...
rajshiv
11 months, 1 week ago
Selected Answer: C
It is C. D will be appropriate only if the labels are integers which is not true in this case.
upvoted 1 times
...
joqu
11 months, 3 weeks ago
Selected Answer: D
The question clearly says "You now have to train a model with the following LABEL MAP". Label map is not one-hot encoding.
upvoted 1 times
...

Topic 1 Question 11

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 11 discussion

You are designing an ML recommendation model for shoppers on your company's ecommerce website. You will use Recommendations AI to build, test, and deploy your system. How should you develop recommendations that increase revenue while following best practices?

  • A. Use the ג€Other Products You May Likeג€ recommendation type to increase the click-through rate.
  • B. Use the ג€Frequently Bought Togetherג€ recommendation type to increase the shopping cart size for each order.
  • C. Import your user events and then your product catalog to make sure you have the highest quality event stream.
  • D. Because it will take time to collect and record product data, use placeholder values for the product catalog to test the viability of the model.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
chohan
Highly Voted 4 years, 4 months ago
Answer should be B https://cloud.google.com/recommendations-ai/docs/placements#rps
upvoted 19 times
...
Celia20210714
Highly Voted 4 years, 3 months ago
ANS:B https://cloud.google.com/recommendations-ai/docs/placements#fbt Frequently bought together (shopping cart expansion) The "Frequently bought together" recommendation predicts items frequently bought together for a specific product within the same shopping session. If a list of products is being viewed, then it predicts items frequently bought with that product list. This recommendation is useful when the user has indicated an intent to purchase a particular product (or list of products) already, and you are looking to recommend complements (as opposed to substitutes). This recommendation is commonly displayed on the "add to cart" page, or on the "shopping cart" or "registry" pages (for shopping cart expansion).
upvoted 9 times
...
desertlotus1211
Most Recent 1 year ago
FYI - now known as Vertex AI Search
upvoted 3 times
...
eico
1 year, 2 months ago
Selected Answer: B
Answer is B Frequently Bought Together (shopping cart expansion) model is recommended to increase revenue The option C is wrong because we should import the catalog first, then bring the user events otherwise the events will be unjoined https://cloud.google.com/retail/docs/user-events#retail-reqs
upvoted 1 times
...
chirag2506
1 year, 4 months ago
Selected Answer: B
ans is B
upvoted 1 times
...
PhilipKoku
1 year, 5 months ago
Selected Answer: B
B) To increase revenue, expand shopping cart with other items frequently bought together.
upvoted 1 times
...
PhilipKoku
1 year, 5 months ago
Selected Answer: B
B) To increase revenue, expand shopping cart with other items frequently bought together.
upvoted 1 times
...
harithacML
2 years, 4 months ago
Selected Answer: B
Req: ML Recommendations + increase revenue + best practices A. Use the ג€Other Products You May Likeג€ recommendation type to increase the click-through rate. : You may like ? No B. Use the ג€Frequently Bought Togetherג€ recommendation type to increase the shopping cart size for each order. : Viable with companies purchase information. Also this is the basic recommendation to get started with : cross sell and upsell C. Import your user events and then your product catalog to make sure you have the highest quality event stream. : Ensuring quality? This makes sure the data quality. Not bringing more sales much D. Because it will take time to collect and record product data, use placeholder values for the product catalog to test the viability of the model. : dummy values to replace for now? No value added to sales.
upvoted 1 times
...
M25
2 years, 6 months ago
Selected Answer: B
Went with B
upvoted 1 times
...
Yajnas_arpohc
2 years, 7 months ago
Selected Answer: C
https://cloud.google.com/recommendations-ai/docs/overview
upvoted 1 times
...
EFIGO
2 years, 11 months ago
Selected Answer: B
B directly impact the revenue
upvoted 1 times
...
GCP72
3 years, 2 months ago
Selected Answer: B
Correct answer is "B"
upvoted 1 times
...
caohieu04
3 years, 8 months ago
Selected Answer: B
Community vote
upvoted 2 times
...
NamitSehgal
3 years, 10 months ago
Event Data is important along with product data but I am not sure if there is a catch here, what goes first https://github.com/GoogleCloudPlatform/analytics-componentized-patterns/blob/master/retail/recommendation-system/bqml/bqml_retail_recommendation_system.ipynb
upvoted 1 times
...
ramen_lover
4 years ago
I don't know the correct answer, but it seems C and D are not correct: - "Do not record user events for product items that have not been imported yet."; i.e., import your product catalog first and then your user events. - "Make sure that all required catalog information is included and correct. Do not use dummy or placeholder values." https://cloud.google.com/retail/recommendations-ai/docs/upload-catalog#catalog_import_best_practices I think the correct answer is B, because the "default optimization objective" for FBT is "revenue per order", whereas the "default optimization objective" for OYML is "click-through rate". https://cloud.google.com/retail/recommendations-ai/docs/placements#fbt
upvoted 4 times
...
mousseUwU
4 years ago
Sense is B
upvoted 1 times
...
gcp2021go
4 years, 5 months ago
the correct answer should be C there is a diagram on the webpage, discuss how it works https://cloud.google.com/recommendations
upvoted 5 times
sensev
4 years, 3 months ago
I think B is the correct answer instead of C, since B directly contributes to increasing revenue.
upvoted 2 times
...
...

Topic 1 Question 12

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 12 discussion

You are designing an architecture with a serverless ML system to enrich customer support tickets with informative metadata before they are routed to a support agent. You need a set of models to predict ticket priority, predict ticket resolution time, and perform sentiment analysis to help agents make strategic decisions when they process support requests. Tickets are not expected to have any domain-specific terms or jargon.
The proposed architecture has the following flow:

Which endpoints should the Enrichment Cloud Functions call?

  • A. 1 = AI Platform, 2 = AI Platform, 3 = AutoML Vision
  • B. 1 = AI Platform, 2 = AI Platform, 3 = AutoML Natural Language
  • C. 1 = AI Platform, 2 = AI Platform, 3 = Cloud Natural Language API
  • D. 1 = Cloud Natural Language API, 2 = AI Platform, 3 = Cloud Vision API
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Celia20210714
Highly Voted 4 years, 3 months ago
ANS: C https://cloud.google.com/architecture/architecture-of-a-serverless-ml-model#architecture The architecture has the following flow: A user writes a ticket to Firebase, which triggers a Cloud Function. -The Cloud Function calls 3 different endpoints to enrich the ticket: -An AI Platform endpoint, where the function can predict the priority. -An AI Platform endpoint, where the function can predict the resolution time. -The Natural Language API to do sentiment analysis and word salience. -For each reply, the Cloud Function updates the Firebase real-time database. -The Cloud Function then creates a ticket into the helpdesk platform using the RESTful API.
upvoted 30 times
...
gcp2021go
Highly Voted 4 years, 5 months ago
the answer should be C. The tickets do not include specific terms , which means, it doesn't need to be custom built. thus, we can use cloud NLP API instead of automl NLP.
upvoted 18 times
...
OpenKnowledge
Most Recent 1 month, 1 week ago
Selected Answer: C
AutoML is not inherently a serverless technology
upvoted 1 times
...
wishyrater
1 year, 1 month ago
Selected Answer: C
ANS: C Tickets are not expected to have any domain-specific terms or jargon. Therefore we can use the Natural Language API, and we don't need to train our own model.
upvoted 2 times
...
PhilipKoku
1 year, 5 months ago
Selected Answer: C
C) Eliminate A and D as not vision or images required. From B (Auto ML Natural Language) requires custom training and C) NLP API gives you sentiment analysis out of the box.
upvoted 4 times
...
Sum_Sum
1 year, 12 months ago
Selected Answer: C
C - as Natural Language API has sentiment analysis and using the API over a custom model is always preferred
upvoted 4 times
...
harithacML
2 years, 4 months ago
Selected Answer: C
Req : serverless ML system + models to (predict ticket priority -predict ticket resolution time- perform sentiment analysis ) The proposed architecture has the following flow: A. 1 = AI Platform, 2 = AI Platform, 3 = AutoML Vision. : No image data as input here. Only text (NLP) B. 1 = AI Platform, 2 = AI Platform, 3 = AutoML Natural Language : Only sentiment for 3rd endpoint. No custom model needed : https://cloud.google.com/natural-language/automl/docs/beginners-guide . So autoML not required C. 1 = AI Platform, 2 = AI Platform, 3 = Cloud Natural Language API : 1- for classification(priority :high low medium), 2- ticket time-regression -3- sentiment analysis the CNL api is enough D. 1 = Cloud Natural Language API, 2 = AI Platform, 3 = Cloud Vision API : No image data
upvoted 4 times
...
M25
2 years, 6 months ago
Selected Answer: C
Went with C
upvoted 1 times
...
wish0035
2 years, 11 months ago
Selected Answer: C
ANS: C This is the exact solution by Google: https://web.archive.org/web/20210618072649/https://cloud.google.com/architecture/architecture-of-a-serverless-ml-model#architecture
upvoted 2 times
...
jespinosal
2 years, 11 months ago
Selected Answer: B
ANS: B As you need to train custom regression models (Auto ML), as NLP API is not going to be able to rank your Priority and eval the Time.
upvoted 1 times
...
jespinosal
2 years, 11 months ago
ANS: C as NLP API is not able to perform custom Regression Models (predict time) and Priority. You need Auto ML o train your own
upvoted 1 times
...
EFIGO
2 years, 11 months ago
Selected Answer: C
AI Platform (now Vertex AI) for both the predictions and Natural Language API for sentiment analysis since there are no specific terms (so no need to custom build something with an AutoML), so C
upvoted 2 times
...
GCP72
3 years, 2 months ago
Selected Answer: C
Correct answer is "C"
upvoted 1 times
...
Mohamed_Mossad
3 years, 5 months ago
Selected Answer: C
- by options eliminations A,D must be dropped we have no vision tasks in this system - answer between B,C , question stated "no specific domain or jargon" so natural laguage api is prefered over automl since there no custom entinites or custom training , so I vote for C
upvoted 2 times
...
caohieu04
3 years, 8 months ago
Selected Answer: C
Community vote
upvoted 4 times
...
alphard
3 years, 11 months ago
Mine is C. Priority prediction is categorical. Resolution time is linear regression. Sentiment is a NLP problem.
upvoted 2 times
...
chohan
4 years, 4 months ago
Should be B, don't forget the domain specific terms and jargons https://medium.com/google-cloud/analyzing-sentiment-of-text-with-domain-specific-vocabulary-and-topics-726b8f287aef
upvoted 1 times
gcp2021go
4 years, 4 months ago
the question said "Tickets are not expected to have any domain-specific terms or jargon."
upvoted 7 times
...
...

Topic 1 Question 13

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 13 discussion

You have trained a deep neural network model on Google Cloud. The model has low loss on the training data, but is performing worse on the validation data. You want the model to be resilient to overfitting. Which strategy should you use when retraining the model?

  • A. Apply a dropout parameter of 0.2, and decrease the learning rate by a factor of 10.
  • B. Apply a L2 regularization parameter of 0.4, and decrease the learning rate by a factor of 10.
  • C. Run a hyperparameter tuning job on AI Platform to optimize for the L2 regularization and dropout parameters.
  • D. Run a hyperparameter tuning job on AI Platform to optimize for the learning rate, and increase the number of neurons by a factor of 2.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
chohan
Highly Voted 4 years, 4 months ago
Should be C https://machinelearningmastery.com/introduction-to-regularization-to-reduce-overfitting-and-improve-generalization-error/
upvoted 26 times
...
inder0007
Highly Voted 4 years, 5 months ago
increasing the size of the network will make the overfitting situation worse
upvoted 8 times
...
gvk1
Most Recent 7 months ago
Selected Answer: C
As its DNN, dropout helps. L1 or L2 are for increasing generalization.
upvoted 1 times
...
chibuzorrr
11 months, 1 week ago
Selected Answer: C
C is the best answers. You cannot increase neurons as the model is too complex already and cannot generalize!
upvoted 1 times
...
fragkris
1 year, 11 months ago
Selected Answer: C
Voted C
upvoted 1 times
...
Sum_Sum
1 year, 12 months ago
Selected Answer: C
A,B have very specific numbers which doesn't gurantee success C is best D - increases the size - which is not helping with overfitting
upvoted 3 times
...
harithacML
2 years, 4 months ago
Selected Answer: C
Req: make model resilient A. Apply a dropout parameter of 0.2, and decrease the learning rate by a factor of 10. : Might / might not work . But may not find optimal parameter set since it uses random values B. Apply a L2 regularization parameter of 0.4, and decrease the learning rate by a factor of 10. : Might / might not work . But may not find optimal parameter set since it uses random values C. Run a hyperparameter tuning job on AI Platform to optimize for the L2 regularization and dropout parameters. : l2 and dropout are regularisation method which would work. Let AI find the optimal solution on how extend these parameters should regularise. Yes this would work. D. Run a hyperparameter tuning job on AI Platform to optimize for the learning rate, and increase the number of neurons by a factor of 2 : AIplatform would do but adding neurons would make network nore complex. So we can eliminate this option.
upvoted 3 times
...
ashu381
2 years, 5 months ago
Selected Answer: C
It should be C as regularization (L1/L2), early stopping and drop out are some of the ways in deep learning to handle overfitting. Other options have specific values which may or may not solve overfitting as it depends on specific use case.
upvoted 2 times
...
M25
2 years, 6 months ago
Selected Answer: C
Went with C
upvoted 2 times
...
wish0035
2 years, 11 months ago
Selected Answer: C
ANS: C A and B are random values, why they choose that values? D could increase even more overfitting since you're using a more complex model.
upvoted 2 times
...
EFIGO
2 years, 11 months ago
Selected Answer: C
We don't know the optimum values for the parameters, so we need to run a hyperparameter tuning job; L2 regularization and dropout parameters are great ways to avoid overfitting. So C is the answer
upvoted 1 times
...
GCP72
3 years, 2 months ago
Selected Answer: C
Correct answer is "C"
upvoted 1 times
...
Mohamed_Mossad
3 years, 5 months ago
Selected Answer: C
- by options eliminations C,D are better than A,D (more automated , scalable) - between C,D C is better as in D "and increase the number of neurons by a factor of 2" will make matters worse and increase overfitting
upvoted 1 times
Mohamed_Mossad
3 years, 4 months ago
also in A,D mainly learning rate has no direct relation with overfitting
upvoted 1 times
...
...
morgan62
3 years, 7 months ago
Selected Answer: C
C for sure
upvoted 2 times
...
giaZ
3 years, 8 months ago
Selected Answer: C
Best practice is to let a AI Platform tool run the tuning to optimize hyperparameters. Why should I trust values in answers A or B?? Plus L2 regularization and dropout are the way to go here.
upvoted 2 times
...
caohieu04
3 years, 8 months ago
Selected Answer: C
Community vote
upvoted 2 times
...
wences
3 years, 9 months ago
Selected Answer: C
it is the logical ans
upvoted 3 times
...

Topic 1 Question 14

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 14 discussion

You built and manage a production system that is responsible for predicting sales numbers. Model accuracy is crucial, because the production model is required to keep up with market changes. Since being deployed to production, the model hasn't changed; however the accuracy of the model has steadily deteriorated.
What issue is most likely causing the steady decline in model accuracy?

  • A. Poor data quality
  • B. Lack of model retraining
  • C. Too few layers in the model for capturing information
  • D. Incorrect data split ratio during model training, evaluation, validation, and test
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
esuaaaa
Highly Voted 4 years, 5 months ago
B. Retraining is needed as the market is changing.
upvoted 31 times
sensev
4 years, 3 months ago
I also think it is B - who is giving the "correct" answers to the questions? I feel like 4 out of 5 of them are incorrect.
upvoted 11 times
...
...
NickHapton
Highly Voted 3 years, 10 months ago
the biggest issue of this website is `all correct answers` are wrong
upvoted 16 times
desertlotus1211
1 year ago
So what do you suggest the answer is ?
upvoted 1 times
...
...
ZhengWeiNg
Most Recent 1 year, 2 months ago
Selected Answer: B
Model is not updated to current sales trends.
upvoted 1 times
...
jsalvasoler
1 year, 3 months ago
Selected Answer: B
B naturally
upvoted 1 times
...
PhilipKoku
1 year, 5 months ago
Selected Answer: B
B) You require model monitoring to identify changes and at the right time retrain the model with new data to avoid model drift.
upvoted 2 times
...
Azhar10
1 year, 7 months ago
Selected Answer: B
The market can be dynamic, Sales trends, customer preferences, and even competitor strategies might evolve over time but our model hasn't changed since the deployment so our model can adapt with these changes by retraining only Degradation Over Time: Without retraining to adapt to these changes, the model's predictions become less accurate as the real world diverges from the data it was trained on.
upvoted 1 times
...
97a158e
1 year, 9 months ago
As the consistent changes in the Market data, the Model in Production should regularly retrain for better results. Option B is the right choice
upvoted 1 times
...
fragkris
1 year, 11 months ago
Selected Answer: B
Keeping the model up to date is crucial. So - B.
upvoted 1 times
...
Sum_Sum
1 year, 12 months ago
Selected Answer: B
B because the environment is changing and the model only captures past performance
upvoted 1 times
...
harithacML
2 years, 4 months ago
Selected Answer: B
Situation : model trained long before. Q : why accuracy of the model has steadily deteriorated. A. Poor data quality : Model perfomance depens on trained model only. Quality issue should be taken care by pipeline and it do not much affect the model to cause a performance slow down over time B. Lack of model retraining : Very obvious C. Too few layers in the model for capturing information : If so model wpould not have been deployed at first stage due to low performance on unseen data D. Incorrect data split ratio during model training, evaluation, validation, and test : This is relevant only at training when model deployed at first place, We have way passed that. Not not the reason.
upvoted 2 times
...
M25
2 years, 6 months ago
Selected Answer: B
Went with B
upvoted 1 times
...
niketd
2 years, 7 months ago
Selected Answer: B
B is correct. Model needs to keep up with the market changes, implying that the underlying data distribution would be changing as well. Hence retrain the model.
upvoted 1 times
...
tavva_prudhvi
2 years, 8 months ago
Selected Answer: B
The questions says the model is required to keep up with market changes, hence retraining needed.
upvoted 1 times
...
Ade_jr
2 years, 10 months ago
B is the correct answer
upvoted 1 times
...
wish0035
2 years, 11 months ago
Selected Answer: B
ANS: B
upvoted 1 times
...
EFIGO
2 years, 11 months ago
Selected Answer: B
Data distribution changes over time and so should do the model, so B is the correct answer
upvoted 1 times
...
GCP72
3 years, 2 months ago
Selected Answer: B
Correct answer is "B"
upvoted 1 times
...

Topic 1 Question 15

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 15 discussion

You have been asked to develop an input pipeline for an ML training model that processes images from disparate sources at a low latency. You discover that your input data does not fit in memory. How should you create a dataset following Google-recommended best practices?

  • A. Create a tf.data.Dataset.prefetch transformation.
  • B. Convert the images to tf.Tensor objects, and then run Dataset.from_tensor_slices().
  • C. Convert the images to tf.Tensor objects, and then run tf.data.Dataset.from_tensors().
  • D. Convert the images into TFRecords, store the images in Cloud Storage, and then use the tf.data API to read the images for training.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
chohan
Highly Voted 3 years, 11 months ago
Should be D
upvoted 21 times
...
alphard
Highly Voted 3 years, 5 months ago
My option is D. Cite from Google Pag: to construct a Dataset from data in memory, use tf.data.Dataset.from_tensors() or tf.data.Dataset.from_tensor_slices(). When input data is stored in a file (not in memory), the recommended TFRecord format, you can use tf.data.TFRecordDataset(). tf.data.Dataset is for data in memory. tf.data.TFRecordDataset is for data in non-memory storage.
upvoted 17 times
...
RyanTan
Most Recent 8 months, 1 week ago
Selected Answer: A
this is one of the review questions in Chapter 3 in the book "Official Google Cloud Certified Professional Machine Learning Engineer Study Guide". tf.data.Dataset.prefetch transformation decouples the time when data is produced to the time when data is consumed so it can reduce the latency. Also the transformation can reduce the memory usage. By the way, tf.data.Dataset.interleave transformation can also used to reduce the latency and memory usage.
upvoted 2 times
...
PhilipKoku
11 months, 1 week ago
Selected Answer: D
D) Storing images in TFRecords optimises storage for images.
upvoted 3 times
...
pinimichele01
1 year ago
Selected Answer: D
tf.data.Dataset is for data in memory. tf.data.TFRecordDataset is for data in non-memory storage.
upvoted 3 times
...
samratashok
1 year, 2 months ago
Selected Answer: D
why this website shows wrong option as answer, this is my observation from so many questions?
upvoted 3 times
...
fragkris
1 year, 5 months ago
Selected Answer: D
D is correct
upvoted 1 times
...
Sum_Sum
1 year, 6 months ago
Selected Answer: D
D because: tf.data.Dataset is for data in memory. tf.data.TFRecordDataset is for data in non-memory storage.
upvoted 2 times
...
boobyg1
1 year, 6 months ago
Selected Answer: D
all "correct" answers are wrong
upvoted 2 times
...
M25
2 years ago
Selected Answer: D
Went with D
upvoted 1 times
...
India_willsmith
2 years, 1 month ago
For all questions the given answers and voted answers are different. Which one should be considered for exam?
upvoted 2 times
Alfredo_OSS
2 years ago
You should consider the voted ones.
upvoted 2 times
...
...
enghabeth
2 years, 3 months ago
Selected Answer: D
Converting your data into TFRecord has many advantages, such as: More efficient storage: the TFRecord data can take up less space than the original data; it can also be partitioned into multiple files. Fast I/O: the TFRecord format can be read with parallel I/O operations, which is useful for TPUs or multiple hosts
upvoted 1 times
...
enghabeth
2 years, 3 months ago
Selected Answer: D
my option is D
upvoted 1 times
...
Omi_04040
2 years, 4 months ago
Ans: D
upvoted 1 times
...
wish0035
2 years, 4 months ago
Selected Answer: D
ans: D
upvoted 1 times
...
EFIGO
2 years, 5 months ago
Selected Answer: D
For data in memory use tf.data.Dataset, for data in non-memory storage use tf.data.TFRecordDataset. Since data don't fit in memory, go with option D.
upvoted 1 times
...
GCP72
2 years, 8 months ago
Selected Answer: D
Correct answer is "D"
upvoted 1 times
...

Topic 1 Question 16

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 16 discussion

You are an ML engineer at a large grocery retailer with stores in multiple regions. You have been asked to create an inventory prediction model. Your model's features include region, location, historical demand, and seasonal popularity. You want the algorithm to learn from new inventory data on a daily basis. Which algorithms should you use to build the model?

  • A. Classification
  • B. Reinforcement Learning
  • C. Recurrent Neural Networks (RNN)
  • D. Convolutional Neural Networks (CNN)
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
esuaaaa
Highly Voted 4 years, 5 months ago
The answer is C. Use RNN because it is a time series analysis.
upvoted 30 times
...
george_ognyanov
Highly Voted 4 years, 1 month ago
As Y2Data pointed out, your reasoning for choosing B does not make much sense. Furthermore, Reinforcement Learning for this question does not make much sense to me. Reinforcement Learning is basically agent - task problems. You give the agent a task i.e. get out of a maze and then through trial and error and many many iterations the agent learns the correct way to perform the task. It is called Reinforcement because you ... well ... reinforce the agent, you reward the agent for correct choices and penalize for incorrect choices. In RL you dont use many / any previous data because the data is generated with each iteration I think.
upvoted 7 times
...
gvk1
Most Recent 7 months ago
Selected Answer: C
model need to reflect memory as problem statement talks about prev day history (near term history). so RNN fits.
upvoted 1 times
...
kamparia
1 year, 1 month ago
Selected Answer: B
I chose B because the model need to learn
upvoted 1 times
...
bludw
1 year, 4 months ago
Selected Answer: A
I would choose A. And it is only because the features already have time-series information (like demand). And it would be way easier to train XGBoost than RNN model.
upvoted 1 times
...
PhilipKoku
1 year, 5 months ago
Selected Answer: C
C) The best choice for this scenario would be C. Recurrent Neural Networks (RNN). Rationale: The task at hand is a time-series prediction problem, where the goal is to predict future inventory levels based on historical data. RNNs are particularly well-suited for such tasks because they have “memory” and can learn patterns in sequential data1. Features like region, location, historical demand, and seasonal popularity can be used as input to the RNN. The network can then learn the temporal dependencies between these features and the inventory levels. RNNs can be trained incrementally, which means the model can be updated daily with new inventory data, allowing the model to adapt to changing trends and patterns
upvoted 3 times
...
vale_76_na_xxx
1 year, 10 months ago
go for C https://www.akkio.com/post/deep-learning-vs-reinforcement-learning-key-differences-and-use-cases#:~:text=Reinforcement%20learning%20is%20particularly%20well,of%20reinforcement%20learning%20in%20action.
upvoted 1 times
...
Sum_Sum
1 year, 12 months ago
Selected Answer: C
The question asks for "prediction model" classification and RL do not fit the bill CNN are used for vision so only answer left is C
upvoted 2 times
...
12112
2 years, 4 months ago
Selected Answer: C
I'm not sure that daily basis means it is time series. It could mean updating the model daily. But I'll follow collective intelligence.
upvoted 2 times
...
M25
2 years, 6 months ago
Selected Answer: C
Went with C
upvoted 1 times
...
enghabeth
2 years, 9 months ago
Selected Answer: B
Reinforcement Learning(RL) is a type of machine learning technique that enables an agent to learn in an interactive environment by trial and error using feedback from its own actions and experiences.
upvoted 1 times
...
wish0035
2 years, 11 months ago
Selected Answer: C
ans: C
upvoted 1 times
...
EFIGO
2 years, 11 months ago
Selected Answer: C
RNN are a fit tool to work with time-series as this one, so C
upvoted 1 times
...
GCP72
3 years, 2 months ago
Selected Answer: C
Correct answer is "C"
upvoted 2 times
...
Mohamed_Mossad
3 years, 5 months ago
Selected Answer: C
"algorithm to learn from new inventory data on a daily basis" = time series model , best option to deal with time series is forsure RNN , vote for C
upvoted 1 times
...
morgan62
3 years, 7 months ago
Selected Answer: C
It's C.
upvoted 3 times
...
A4M
3 years, 9 months ago
C - for time series
upvoted 2 times
...

Topic 1 Question 17

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 17 discussion

You are building a real-time prediction engine that streams files which may contain Personally Identifiable Information (PII) to Google Cloud. You want to use the
Cloud Data Loss Prevention (DLP) API to scan the files. How should you ensure that the PII is not accessible by unauthorized individuals?

  • A. Stream all files to Google Cloud, and then write the data to BigQuery. Periodically conduct a bulk scan of the table using the DLP API.
  • B. Stream all files to Google Cloud, and write batches of the data to BigQuery. While the data is being written to BigQuery, conduct a bulk scan of the data using the DLP API.
  • C. Create two buckets of data: Sensitive and Non-sensitive. Write all data to the Non-sensitive bucket. Periodically conduct a bulk scan of that bucket using the DLP API, and move the sensitive data to the Sensitive bucket.
  • D. Create three buckets of data: Quarantine, Sensitive, and Non-sensitive. Write all data to the Quarantine bucket. Periodically conduct a bulk scan of that bucket using the DLP API, and move the data to either the Sensitive or Non-Sensitive bucket.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
chohan
Highly Voted 4 years, 4 months ago
Should be D https://cloud.google.com/architecture/automating-classification-of-data-uploaded-to-cloud-storage#building_the_quarantine_and_classification_pipeline
upvoted 28 times
Swagluke
4 years, 2 months ago
All PII should be Sensitive data, that's why I think the answer is A.
upvoted 1 times
...
u_phoria
3 years, 4 months ago
Option D, as documented in that link (a fully automated process, using Cloud Functions - rather than a "periodic" scan as worded in the question), would be my choice. It's easier than B, which would work for a real-time scenario - but would require loads more custom work to implement (things like batching, segmentation, triggering). A and C are 'reactive' / periodic, and so not appropriate for the given scenario.
upvoted 1 times
...
...
maartenalexander
Highly Voted 4 years, 4 months ago
D; others pose risks
upvoted 6 times
...
br8ok
Most Recent 5 months, 2 weeks ago
Selected Answer: D
B says write batches of data to BIQuery, which means it has to wait for the batch to be full, which makes it not real-time.
upvoted 2 times
...
joqu
11 months, 1 week ago
Selected Answer: B
Task: "You are building a real-time prediction engine". Solution D: " Periodically conduct a bulk scan" is NOT REAL TIME. It is the recommended architecture, but in order to satisfy the task, the cloud function would need to be triggered by each new data landing in GCP - which is totally doable and would be the right solution, but this is NOT WHAT THE ANSWER IS SAYING. Therefore B is the next best option (although not the recommended architecture design)
upvoted 1 times
...
Choisus
1 year ago
Selected Answer: B
why not b, it requires real-time right?
upvoted 1 times
...
PhilipKoku
1 year, 5 months ago
Selected Answer: D
D) The best choice for this scenario would be D. Create three buckets of data: Quarantine, Sensitive, and Non-sensitive. Write all data to the Quarantine bucket. Periodically conduct a bulk scan of that bucket using the DLP API, and move the data to either the Sensitive or Non-Sensitive bucket.
upvoted 1 times
...
fragkris
1 year, 11 months ago
Selected Answer: D
D - Quarantine bucket is the google reccomended approach
upvoted 2 times
...
tavva_prudhvi
2 years ago
Selected Answer: D
Option B does not provide a clear separation between sensitive and non-sensitive data before it is written to BigQuery, which means that PII might be exposed during the process. But, in D offers a better level of security by writing all the data to a Quarantine bucket first. This way, the DLP API can scan and categorize the data into Sensitive or Non-sensitive buckets before it is further processed or stored. This ensures that PII is not accessible by unauthorized individuals, as the sensitive data is identified and separated from the non-sensitive data before any further actions are taken.
upvoted 1 times
...
harithacML
2 years, 4 months ago
Selected Answer: D
real-time prediction engine, that streams files to Google Cloud. PII is not accessible by unauthorized individuals. D
upvoted 1 times
...
Liting
2 years, 4 months ago
Selected Answer: D
D should be the correct answer
upvoted 1 times
...
M25
2 years, 6 months ago
Selected Answer: D
Went with D
upvoted 2 times
...
lucaluca1982
2 years, 6 months ago
Selected Answer: B
B is real time
upvoted 1 times
...
dfdrin
2 years, 7 months ago
Selected Answer: D
It's D
upvoted 1 times
...
enghabeth
2 years, 9 months ago
Selected Answer: B
A, D, C they do not apply to a realtime case, all three say that the scan is applied periodically Then it's B
upvoted 3 times
tavva_prudhvi
2 years, 8 months ago
Never mentioned periodically in the question, if I'm not wrong?
upvoted 1 times
...
...
guilhermebutzke
2 years, 10 months ago
Selected Answer: B
I think that is the correct because of the "real time" application.
upvoted 1 times
...
EFIGO
2 years, 11 months ago
Selected Answer: D
D is the right answer: you can temporarily store the sensitive data in a Quarantine bucket with restricted access, then move the data to the relative buckets once the PII have been protected.
upvoted 2 times
...
GCP72
3 years, 2 months ago
Selected Answer: D
Correct answer is "D"
upvoted 1 times
...

Topic 1 Question 18

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 18 discussion

You work for a large hotel chain and have been asked to assist the marketing team in gathering predictions for a targeted marketing strategy. You need to make predictions about user lifetime value (LTV) over the next 20 days so that marketing can be adjusted accordingly. The customer dataset is in BigQuery, and you are preparing the tabular data for training with AutoML Tables. This data has a time signal that is spread across multiple columns. How should you ensure that
AutoML fits the best model to your data?

  • A. Manually combine all columns that contain a time signal into an array. AIlow AutoML to interpret this array appropriately. Choose an automatic data split across the training, validation, and testing sets.
  • B. Submit the data for training without performing any manual transformations. AIlow AutoML to handle the appropriate transformations. Choose an automatic data split across the training, validation, and testing sets.
  • C. Submit the data for training without performing any manual transformations, and indicate an appropriate column as the Time column. AIlow AutoML to split your data based on the time signal provided, and reserve the more recent data for the validation and testing sets.
  • D. Submit the data for training without performing any manual transformations. Use the columns that have a time signal to manually split your data. Ensure that the data in your validation set is from 30 days after the data in your training set and that the data in your testing sets from 30 days after your validation set.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
kkd14
Highly Voted 4 years, 3 months ago
Should be D. As time signal that is spread across multiple columns so manual split is required.
upvoted 25 times
sensev
4 years, 3 months ago
Also think it is D, since it mentioned that the time signal is spread across multiple columns.
upvoted 4 times
GogoG
4 years ago
Correct answer is C - AutoML handles training, validation, test splits automatically for you when you specify a Time column. There is no requirement to do this manually.
upvoted 7 times
george_ognyanov
4 years ago
Correct answer is D. It clearly says the time signal data is spread across different columns. If it weren't then C would be correct and your point would be valid. However, in this case the answer is D 100%. https://cloud.google.com/automl-tables/docs/data-best-practices#time
upvoted 10 times
irumata
3 years, 9 months ago
this comment is only about time information in different columns, not about time itself. C is correct as for me
upvoted 1 times
irumata
3 years, 9 months ago
but if time signal means time mark not the business signal the D is the correct - very controversial
upvoted 1 times
...
...
...
...
...
Werner123
1 year, 8 months ago
I think the answer is C. In this case I am interpreting time signal as the features that hold predictive power as a function of time i.e. time signal. There is no indication to how much data is available so using the 30 days after mark is not wise. You only have 30 days worth of data for validation set. If you have a few years worth of data this seems like a unnecessary small validation set.
upvoted 7 times
...
...
DucLee3110
Highly Voted 4 years, 4 months ago
C You use the Time column to tell AutoML Tables that time matters for your data; it is not randomly distributed over time. When you specify the Time column, AutoML Tables use the earliest 80% of the rows for training, the next 10% of rows for validation, and the latest 10% of rows for testing. AutoML Tables treats each row as an independent and identically distributed training example; setting the Time column does not change this. The Time column is used only to split the data set. You must include a value for the Time column for every row in your dataset. Make sure that the Time column has enough distinct values, so that the evaluation and test sets are non-empty. Usually, having at least 20 distinct values should be sufficient. https://cloud.google.com/automl-tables/docs/prepare#time
upvoted 14 times
salsabilsf
4 years, 3 months ago
From the link you provided, I think it's A : The Time column must have a data type of Timestamp. During schema review, you select this column as the Time column. (In the API, you use the timeColumnSpecId field.) This selection takes effect only if you have not specified the data split column. If you have a time-related column that you do not want to use to split your data, set the data type for that column to Timestamp but do not set it as the Time column.
upvoted 3 times
...
...
dija123
Most Recent 1 month, 1 week ago
Selected Answer: C
C is Google-recommended best practic Also there is no way to split data manually in AutoML
upvoted 1 times
...
b7ad1d9
1 month, 3 weeks ago
Selected Answer: D
Option D because AutoML needs clear TIME column to autosplit the training, validation ,test sets. Since that is not provided, you have to manually split the records. Not a very well-formed question. From: https://cloud.google.com/vertex-ai/docs/tabular-data/bp-tabular Provide a time signal If the time information is not contained in a single column, you can use a manual data split to use the most recent data as the test data, and the earliest data as the training data. AutoML can
upvoted 1 times
...
TienH
6 months ago
Selected Answer: C
The correct answer is C the model will be trained on earlier data and tested on more recent data, which matches how it will be used in production to predict future LTV. This chronological splitting is essential for accurate evaluation of time-series forecasting models.
upvoted 1 times
...
shahriar096
7 months, 1 week ago
Selected Answer: C
C is correct answer
upvoted 2 times
...
coupet
7 months, 1 week ago
Selected Answer: D
D is Correct - A time signal spread across multiple columns in a spreadsheet or data table would typically represent a time-series data where each column corresponds to a specific time point or interval, and the values in each column represent the signal's value at that time
upvoted 1 times
...
rajshiv
11 months, 1 week ago
Selected Answer: C
C is the right answer as manually splitting data data based on time adds unnecessary complexity. AutoML Tables can handle the time-based splits for us automatically when we specify the time column. Option D requires more manual intervention and introduces the risk of making errors in the data splitting process.
upvoted 1 times
...
Dirtie_Sinkie
1 year, 1 month ago
D could work, but I'm still leaning towards C
upvoted 1 times
...
nktyagi
1 year, 3 months ago
Selected Answer: C
AutoML handles training, validation, test splits automatically for you when you specify a Time column. There is no requirement to do this manually.
upvoted 1 times
...
PhilipKoku
1 year, 5 months ago
Selected Answer: D
D)D is correct, as this would satisfy the days criteria mentioned in the question. 30 days is more than 20 days, and the prediction model can be used on a validation dataset to validate the results for the next 20 days.
upvoted 2 times
...
guilhermebutzke
1 year, 9 months ago
Selected Answer: D
thinking that "spread across multiple columns" seems like "columns with redundant information," and considering how AutoML can deal with correlated columns, I think option C is the best choice, with no need for a manual split. However, "time information is not contained in a single column" is the same thing as "time signal that is spread across multiple columns." I agree that D could be the best option. Then, I tend to think that D is the best choice because the text could be more clearly expressed in redundant options.
upvoted 3 times
...
Mickey321
1 year, 11 months ago
Selected Answer: C
Either C or D but leaning towards C as not get the 30 days in D
upvoted 2 times
...
Sum_Sum
1 year, 12 months ago
Selected Answer: D
"data has a time signal that is spread across multiple columns" - I interpret as having > 1 timeseries column. AutoML knows how to deal with a single column but not multiple hence answer is D
upvoted 2 times
...
Krish6488
2 years ago
Selected Answer: C
Since AutoML is good enough to perform the splits, C appears to be the right answer. Moreover, time information across multiple columns which requires manual split as per option D is different from the question's scenario where the time signal is spread across multiple columns which can be hours, months, days, etc. if we can define in AutoML the right time signal column, its enojugh to split the data and pick most recent data as test data and earliest data as test data
upvoted 1 times
...
atlas_lyon
2 years, 2 months ago
Selected Answer: D
A Wrong, Even if columns are combines into a 1D-array(column), the time signal should be noticed to autoML anyway. Automatic split cannot work since we need more than 20 days history B Wrong, Without indicating time signal to AutoML, data would leak in (time leakage) in training/validation/test sets C Wrong, but might be possible if time signal wouldn't have bee spread across multiple columns D True, because time signal is spread accross multiple columns require to manually split the data. Since we want to predict LTV over the next 20 days, it is necessary to have at least 20 days history between the splits (30 seems okay: 10 days predictions) Validating and testing on the last 2 months seems reasonable for marketing purpose (usually seasonal).
upvoted 2 times
...
12112
2 years, 4 months ago
Why 30 days after each data sets, even though we need to predict only for 20 days?
upvoted 1 times
...

Topic 1 Question 19

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 19 discussion

You have written unit tests for a Kubeflow Pipeline that require custom libraries. You want to automate the execution of unit tests with each new push to your development branch in Cloud Source Repositories. What should you do?

  • A. Write a script that sequentially performs the push to your development branch and executes the unit tests on Cloud Run.
  • B. Using Cloud Build, set an automated trigger to execute the unit tests when changes are pushed to your development branch.
  • C. Set up a Cloud Logging sink to a Pub/Sub topic that captures interactions with Cloud Source Repositories. Configure a Pub/Sub trigger for Cloud Run, and execute the unit tests on Cloud Run.
  • D. Set up a Cloud Logging sink to a Pub/Sub topic that captures interactions with Cloud Source Repositories. Execute the unit tests using a Cloud Function that is triggered when messages are sent to the Pub/Sub topic.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
maartenalexander
Highly Voted 4 years, 4 months ago
B. GCP recommends to use Cloud Build when building KubeFlow Pipelines. It's possible to run unit tests in Cloud Build. And, the others seems overly complex/unnecessary
upvoted 19 times
...
mousseUwU
Highly Voted 4 years ago
B makes sense because of this: https://cloud.google.com/architecture/architecture-for-mlops-using-tfx-kubeflow-pipelines-and-cloud-build#cicd_architecture
upvoted 9 times
mousseUwU
4 years ago
The image explains a lot
upvoted 2 times
...
...
bc3f222
Most Recent 8 months, 2 weeks ago
Selected Answer: B
GCP recommends to use Cloud Build when building KubeFlow Pipelines
upvoted 1 times
...
bludw
1 year, 4 months ago
Selected Answer: B
B: No need of any Pub/Sub stuff
upvoted 1 times
...
PhilipKoku
1 year, 5 months ago
Selected Answer: B
B. Cloud Build.
upvoted 1 times
...
Sum_Sum
1 year, 12 months ago
Selected Answer: B
B is the only sensible answer as its a feature of CloudBuild everything else is the delusions of a madmen
upvoted 2 times
...
SamuelTsch
2 years, 4 months ago
Selected Answer: B
A, C, D need addiontal maunal tasks. B is correct.
upvoted 1 times
...
Scipione_
2 years, 5 months ago
Selected Answer: B
Cloud Build is the best choice but the other answers are feasible.
upvoted 1 times
...
M25
2 years, 6 months ago
Selected Answer: B
Went with B
upvoted 1 times
...
enghabeth
2 years, 9 months ago
Selected Answer: B
Because it is the most automatic of the options
upvoted 1 times
...
wish0035
2 years, 11 months ago
Selected Answer: B
ans: B
upvoted 1 times
...
EFIGO
2 years, 11 months ago
Selected Answer: B
B is the Google-recommended best practice.
upvoted 1 times
...
GCP72
3 years, 2 months ago
Correct answer is "B"
upvoted 1 times
...
morgan62
3 years, 7 months ago
Selected Answer: B
B it is.
upvoted 2 times
...
Danny2021
4 years, 2 months ago
Easy one, B, Cloud Build is the tool for CI/CD.
upvoted 5 times
...

Topic 1 Question 20

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 20 discussion

You are training an LSTM-based model on AI Platform to summarize text using the following job submission script: gcloud ai-platform jobs submit training $JOB_NAME \
--package-path $TRAINER_PACKAGE_PATH \
--module-name $MAIN_TRAINER_MODULE \
--job-dir $JOB_DIR \
--region $REGION \
--scale-tier basic \
-- \
--epochs 20 \
--batch_size=32 \
--learning_rate=0.001 \
You want to ensure that training time is minimized without significantly compromising the accuracy of your model. What should you do?

  • A. Modify the 'epochs' parameter.
  • B. Modify the 'scale-tier' parameter.
  • C. Modify the 'batch size' parameter.
  • D. Modify the 'learning rate' parameter.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
maartenalexander
Highly Voted 4 years, 4 months ago
B. Changing the scale tier does not impact performance–only speeds up training time. Epochs, Batch size, and learning rate all are hyperparameters that might impact model accuracy.
upvoted 33 times
...
bc3f222
Most Recent 8 months, 2 weeks ago
Selected Answer: B
B is still correct as now scale-tier will be replaced by the exact machine config instead
upvoted 1 times
...
DaleR
1 year ago
B is correct however this parameter looks like it is being deprecated.
upvoted 4 times
...
desertlotus1211
1 year ago
he scale-tier parameter in AI Platform determines the computing resources (e.g., CPU, GPU, or TPU) that are allocated for your training job. By increasing the scale-tier from basic to a more powerful tier (e.g., standard, premium, or custom), you can allocate more resources (like GPUs or TPUs) for your job. This will significantly reduce training time, especially for LSTM-based models that benefit from parallel processing on GPUs or TPUs.
upvoted 3 times
desertlotus1211
1 year ago
Answer B
upvoted 1 times
...
...
SamuelTsch
2 years, 4 months ago
Selected Answer: B
A, C, D could impact the accuracy. But B not.
upvoted 1 times
...
M25
2 years, 6 months ago
Selected Answer: B
Went with B
upvoted 1 times
...
enghabeth
2 years, 9 months ago
Selected Answer: B
A is incorrect, less training iteration will affect model performance. B is correct, cost is not a concern as it is not mentioned in the question, the scale tier can be upgraded to significantly minimize the training time. C is incorrect, wouldn’t affect training time, but would affect model performance. D is incorrect, the model might converge faster with higher learning rate, but this would affect the training routine and might cause exploding gradients.
upvoted 2 times
...
ares81
2 years, 10 months ago
Selected Answer: B
It's B!
upvoted 1 times
...
EFIGO
2 years, 11 months ago
Selected Answer: B
A, C, D are all about hyperparameters that might impact model accuracy, while B is just about computing speed; so upgrading the scale tier will make the model faster with no chance of reducing accuracy.
upvoted 2 times
...
GCP72
3 years, 2 months ago
Selected Answer: B
Correct answer is "B"
upvoted 1 times
...
Mohamed_Mossad
3 years, 5 months ago
Selected Answer: B
- using options elimination all options except B can harm the accuracy
upvoted 3 times
...
morgan62
3 years, 7 months ago
Selected Answer: B
B for sure.
upvoted 2 times
...
igor_nov1
3 years, 8 months ago
Selected Answer: B
Might be hrlpfull https://cloud.google.com/ai-platform/training/docs/machine-types#scale_tiers Google may optimize the configuration of the scale tiers for different jobs over time, based on customer feedback and the availability of cloud resources. Each scale tier is defined in terms of its suitability for certain types of jobs. Generally, the more advanced the tier, the more machines are allocated to the cluster, and the more powerful the specifications of each virtual machine. As you increase the complexity of the scale tier, the hourly cost of training jobs, measured in training units, also increases. See the pricing page to calculate the cost of your job.
upvoted 1 times
...
ashii007
3 years, 11 months ago
A,C and D all point to hyper parameter tuning which is not the objective in the question. As others have said - B is only way to improve the time to training the model.
upvoted 3 times
...
santy79
3 years, 11 months ago
Selected Answer: B
examtopics , Can we attach releveant docs why C ?
upvoted 1 times
...
mousseUwU
4 years ago
Correct is B, scale-tier is the definition of what GPU will be used: https://cloud.google.com/ai-platform/training/docs/using-gpus
upvoted 3 times
...
Y2Data
4 years, 1 month ago
Should be B. Question didn't say anything about cost, so while B would increase cost with more computation time, it would save real-world time.
upvoted 3 times
...

Topic 1 Question 21

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 21 discussion

You have deployed multiple versions of an image classification model on AI Platform. You want to monitor the performance of the model versions over time. How should you perform this comparison?

  • A. Compare the loss performance for each model on a held-out dataset.
  • B. Compare the loss performance for each model on the validation data.
  • C. Compare the receiver operating characteristic (ROC) curve for each model using the What-If Tool.
  • D. Compare the mean average precision across the models using the Continuous Evaluation feature.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
chohan
Highly Voted 4 years, 4 months ago
Answer is D
upvoted 15 times
...
Sum_Sum
Highly Voted 1 year, 12 months ago
Selected Answer: D
D - because you are using a Google provided feature. remember in this exam its important to always choose the google services over anything else
upvoted 7 times
...
TienH
Most Recent 5 months, 4 weeks ago
Selected Answer: D
The correct answer is D Continuous Evaluation is specifically designed for monitoring model performance in production environments
upvoted 3 times
...
jkkim_jt
1 year ago
Selected Answer: D
[B] Compare the loss performance for each model on the validation data. --> Not validation data but testing data
upvoted 1 times
...
bludw
1 year, 4 months ago
Selected Answer: A
The answer is A. I am not sure why people choose B vs A as you may overfit your validation set. And you are using your held-out set really rare == no option to overfit.
upvoted 3 times
RyanTan
8 months, 1 week ago
true. I guess people chose B because the official study guide said so but in my view that's obviously wrong.
upvoted 1 times
...
...
Wookjae
1 year, 5 months ago
Continuous Evaluation feature is deprecated.
upvoted 1 times
Goosemoose
1 year, 5 months ago
so it looks like that B is the best answer
upvoted 2 times
...
Goosemoose
1 year, 5 months ago
so is the what if tool
upvoted 1 times
...
...
saadci
1 year, 5 months ago
Selected Answer: B
In the official study guide, this was the explanation given for answer B : "The image classification model is a deep learning model. You minimize the loss of deep learning models to get the best model. So comparing loss performance for each model on validation data is the correct answer."
upvoted 4 times
joqu
11 months, 1 week ago
you minimise loss DURING TRAINING to get the best model. you don't use it for performance monitoring of a deployed model
upvoted 3 times
...
...
claude2046
2 years, 1 month ago
mAP is for object detection, so the answer should be B
upvoted 3 times
...
Liting
2 years, 4 months ago
Selected Answer: D
Went with D, using continuous evaluation feature seems correct to me.
upvoted 1 times
...
SamuelTsch
2 years, 4 months ago
Selected Answer: D
I choose by myself D. But as I read the post here https://www.v7labs.com/blog/mean-average-precision, I was not sure about D. It wrote mAP is commonly used for object detection or instance segmentation tasks. Validation Dataset in GCP context: not trained dataset and not seen dataset
upvoted 1 times
...
Voyager2
2 years, 5 months ago
Selected Answer: D
D. Compare the mean average precision across the models using the Continuous Evaluation feature https://cloud.google.com/vertex-ai/docs/evaluation/introduction Vertex AI provides model evaluation metrics, such as precision and recall, to help you determine the performance of your models... Vertex AI supports evaluation of the following model types: AuPRC: The area under the precision-recall (PR) curve, also referred to as average precision. This value ranges from zero to one, where a higher value indicates a higher-quality model.
upvoted 2 times
...
M25
2 years, 6 months ago
Selected Answer: D
Went with D
upvoted 1 times
...
lucaluca1982
2 years, 6 months ago
Selected Answer: B
I go for B. Option D is good when we are already in production
upvoted 1 times
...
prakashkumar1234
2 years, 7 months ago
o monitor the performance of the model versions over time, you should compare the loss performance for each model on the validation data. Therefore, option B is the correct answer.
upvoted 1 times
Jarek7
2 years, 6 months ago
Please, How? B is not monitoring. It is a validation. The definition of monitoring states: "observe and check the progress or quality of (something) over a period of time" So it is a continuous process. Each option A,B,C are just one time check, not monitoring.
upvoted 3 times
...
...
Fatiy
2 years, 8 months ago
Selected Answer: B
The best option to monitor the performance of multiple versions of an image classification model on AI Platform over time is to compare the loss performance for each model on the validation data. Option B is the best approach because comparing the loss performance of each model on the validation data is a common method to monitor machine learning model performance over time. The validation data is a subset of the data that is not used for model training, but is used to evaluate its performance during training and to compare different versions of the model. By comparing the loss performance of each model on the same validation data, you can determine which version of the model has better performance.
upvoted 4 times
...
enghabeth
2 years, 9 months ago
Selected Answer: D
If you have multiple model versions in a single model and have created an evaluation job for each one, you can view a chart comparing the mean average precision of the model versions over time
upvoted 1 times
...
guilhermebutzke
2 years, 9 months ago
Guys, I not sure about the answer D ... And maybe you could help me in my arguments. I think choose loss to compare the model performance is better than see for metrics. For example, when can build an image model classification that has good precision metrics, because the class in unbalanced, but the loss could be terrible because of kind of loss choose that penalizes classes. so, losses are better than metrics to available models, and the answer is in A or B. I thought that the A could be the answer because I see validation as a part of the training process. So, If we want to test the model performance over time, we have to use new data, which I suppose to be the held-out data.
upvoted 3 times
...

Topic 1 Question 22

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 22 discussion

You trained a text classification model. You have the following SignatureDefs:

You started a TensorFlow-serving component server and tried to send an HTTP request to get a prediction using: headers = {"content-type": "application/json"} json_response = requests.post('http: //localhost:8501/v1/models/text_model:predict', data=data, headers=headers)
What is the correct way to write the predict request?

  • A. data = json.dumps({ג€signature_nameג€: ג€seving_defaultג€, ג€instancesג€ [['ab', 'bc', 'cd']]})
  • B. data = json.dumps({ג€signature_nameג€: ג€serving_defaultג€, ג€instancesג€ [['a', 'b', 'c', 'd', 'e', 'f']]})
  • C. data = json.dumps({ג€signature_nameג€: ג€serving_defaultג€, ג€instancesג€ [['a', 'b', 'c'], ['d', 'e', 'f']]})
  • D. data = json.dumps({ג€signature_nameג€: ג€serving_defaultג€, ג€instancesג€ [['a', 'b'], ['c', 'd'], ['e', 'f']]})
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
maartenalexander
Highly Voted 4 years, 4 months ago
Most likely D. A negative number in the shape enables auto expand (https://stackoverflow.com/questions/37956197/what-is-the-negative-index-in-shape-arrays-used-for-tensorflow). Then the first number -1 out of the shape (-1, 2) speaks the number of 1 dimensional arrays within the tensor (and it can autoexpand) while the second numer (2) sets the number of elements in the inner array at 2. Hence D.
upvoted 24 times
...
jkkim_jt
Highly Voted 1 year ago
Selected Answer: D
the shape (-1, 2) indicates that the data can have any number of rows (denoted by -1), but must have exactly 2 columns. In machine learning, especially in frameworks like TensorFlow or Keras, the -1 acts as a placeholder for dynamic batch sizes, meaning the model can process inputs with any number of samples (rows), but each sample must have exactly 2 features (columns).
upvoted 6 times
...
PhilipKoku
Most Recent 1 year, 5 months ago
Selected Answer: D
D) Any rows, 2 columns.
upvoted 1 times
...
M25
2 years, 6 months ago
Selected Answer: D
Went with D
upvoted 2 times
...
wish0035
2 years, 11 months ago
Selected Answer: D
ans: D
upvoted 1 times
...
EFIGO
2 years, 11 months ago
Selected Answer: D
Having "shape=[-1,2]", the input can have as many rows as we want, but each row needs to be of 2 elements. The only option satisfying this requirement is D.
upvoted 1 times
...
GCP72
3 years, 2 months ago
Selected Answer: D
Correct answer is "D"
upvoted 1 times
...
Mohamed_Mossad
3 years, 5 months ago
Selected Answer: D
will vote for D , as the data shape in instances matches the shape in signature def
upvoted 1 times
...
pml2021
3 years, 7 months ago
Selected Answer: D
shape is (-1,2) indicating any no of rows, 2 columns only.
upvoted 2 times
...
mousseUwU
4 years ago
D is correct if shape(-1,2) means 2 columns for each row
upvoted 3 times
mousseUwU
4 years ago
Link to explanation: https://stackoverflow.com/questions/37956197/what-is-the-negative-index-in-shape-arrays-used-for-tensorflow
upvoted 1 times
...
...
Danny2021
4 years, 2 months ago
D: (-1, 2) represents a vector with any number of rows but only 2 columns.
upvoted 5 times
...
inder0007
4 years, 5 months ago
Correct answer is D, the shapes otherwise don't matter
upvoted 4 times
...

Topic 1 Question 23

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 23 discussion

Your organization's call center has asked you to develop a model that analyzes customer sentiments in each call. The call center receives over one million calls daily, and data is stored in Cloud Storage. The data collected must not leave the region in which the call originated, and no Personally Identifiable Information (PII) can be stored or analyzed. The data science team has a third-party tool for visualization and access which requires a SQL ANSI-2011 compliant interface. You need to select components for data processing and for analytics. How should the data pipeline be designed?

  • A. 1= Dataflow, 2= BigQuery
  • B. 1 = Pub/Sub, 2= Datastore
  • C. 1 = Dataflow, 2 = Cloud SQL
  • D. 1 = Cloud Function, 2= Cloud SQL
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
inder0007
Highly Voted 3 years, 11 months ago
The correct answer is A
upvoted 19 times
GogoG
3 years, 7 months ago
Evidence here https://github.com/GoogleCloudPlatform/dataflow-contact-center-speech-analysis
upvoted 7 times
...
...
salsabilsf
Highly Voted 3 years, 11 months ago
Should be A
upvoted 8 times
...
coupet
Most Recent 7 months, 1 week ago
Selected Answer: A
Correct Answer A Dataflow allows you to create data pipelines that read from one or more sources, transform the data, and write it to a destination. BigQuery is designed for large-scale analytics on structured and semi-structured data
upvoted 1 times
...
PhilipKoku
11 months, 1 week ago
Selected Answer: A
A) Dataflow & BigQuery (Analytics)
upvoted 2 times
...
Sum_Sum
1 year, 5 months ago
Selected Answer: A
A - because it has BigQuery. Almost never would you see an answer that prefers CloudSQL over BQ
upvoted 3 times
...
M25
2 years ago
Selected Answer: A
Went with A
upvoted 2 times
...
MithunDesai
2 years, 4 months ago
Selected Answer: A
correct answer is A
upvoted 1 times
...
Moulichintakunta
2 years, 5 months ago
Selected Answer: A
we need a dataflow to process data from cloud storage and data is unstructured and if we want to perform analysis on unstructured with SQL interface BIgQuery is the only option
upvoted 1 times
...
EFIGO
2 years, 5 months ago
Selected Answer: A
You need to do analytics, so the answer needs to contain BigQuery and only option A does. Moreover, BigQuery is fine with SQL and Dataflow is the right tool for the processing pipline.
upvoted 1 times
...
GCP72
2 years, 8 months ago
Selected Answer: A
Correct answer is "A"
upvoted 1 times
...
SUNWS7
2 years, 11 months ago
D - to call API you need Cloud Functions. Dataflow would be for ETL
upvoted 2 times
SUNWS7
2 years, 11 months ago
Sorry incorrect - Dataflow can call external API so stand corrected . Answer : A
upvoted 2 times
...
...
SUNWS7
3 years, 1 month ago
Selected Answer: A
Dataflow & BigQuery
upvoted 2 times
...
skipper_com
3 years, 5 months ago
A, https://cloud.google.com/architecture/architecture-for-mlops-using-tfx-kubeflow-pipelines-and-cloud-build Fig.6
upvoted 1 times
...
mousseUwU
3 years, 6 months ago
A is correct Dataflow - Unified stream and batch data processing that's serverless, fast, and cost-effective BigQuery - Good for analytics and dashboards
upvoted 3 times
...
pddddd
3 years, 7 months ago
BQ is SQL ANSI-2011 compliant
upvoted 1 times
...
Danny2021
3 years, 8 months ago
A or C. Not sure how many third-party tool supports BigQuery. If not, then the answer is C.
upvoted 2 times
David_ml
3 years ago
wrong. cloud sql is not for analytics.
upvoted 1 times
...
...
Jijiji
3 years, 8 months ago
it's def A
upvoted 3 times
...

Topic 1 Question 24

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 24 discussion

You are an ML engineer at a global shoe store. You manage the ML models for the company's website. You are asked to build a model that will recommend new products to the user based on their purchase behavior and similarity with other users. What should you do?

  • A. Build a classification model
  • B. Build a knowledge-based filtering model
  • C. Build a collaborative-based filtering model
  • D. Build a regression model using the features as predictors
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
maartenalexander
Highly Voted 4 years, 4 months ago
C. Collaborative filtering is about user similarity and product recommendations. Other models won't work
upvoted 23 times
...
coupet
Most Recent 7 months, 1 week ago
Selected Answer: C
Collaborative filtering models, a core technique in recommender systems, predict user preferences by analyzing interactions between users and items, identifying similar users or items, and recommending items liked by similar users or the user in question
upvoted 2 times
...
DaleR
1 year ago
C. Collaborative filtering is a foundational model for building a recommendation system as the input dataset is simple and the embeddings are learned for you. Matrix factorization is simply the model that applies to the collaborative filtering.
upvoted 2 times
...
PhilipKoku
1 year, 5 months ago
Selected Answer: C
C) Collaborative filtering model
upvoted 1 times
...
Sum_Sum
1 year, 12 months ago
Selected Answer: C
Chat gPT: Collaborative filtering models are specifically designed for recommendation systems. They work by analyzing the interactions and behaviors of users and items, then making predictions about what users will like based on similarities with other users. In this case, since you're looking at purchase behavior and user similarities, a collaborative filtering approach is well-suited to identify and recommend products that users with similar behaviors have liked or purchased. Classification models (Option A) and regression models (Option D) are generally used for different types of predictive modeling tasks, not specifically for recommendations. A knowledge-based filtering model (Option B), while useful in recommendation systems, relies more on explicit knowledge about users and items, rather than on user interaction patterns and similarities, which seems to be the focus in this scenario.
upvoted 2 times
...
10SR
2 years, 2 months ago
C. Collaborative filtering is apt amongst the answers
upvoted 1 times
...
M25
2 years, 6 months ago
Selected Answer: C
Went with C
upvoted 2 times
...
wish0035
2 years, 11 months ago
Selected Answer: C
ans: C
upvoted 1 times
...
hiromi
2 years, 11 months ago
Selected Answer: C
C https://cloud.google.com/blog/topics/developers-practitioners/looking-build-recommendation-system-google-cloud-leverage-following-guidelines-identify-right-solution-you-part-i
upvoted 1 times
...
EFIGO
2 years, 11 months ago
Selected Answer: C
This is a textbook application of collaborative filtering, C is the correct answer
upvoted 1 times
...
GCP72
3 years, 2 months ago
Selected Answer: C
Correct answer is "C"
upvoted 1 times
...
Mohamed_Mossad
3 years, 5 months ago
Selected Answer: C
https://developers.google.com/machine-learning/recommendation/collaborative/basics
upvoted 1 times
...
giaZ
3 years, 8 months ago
Selected Answer: C
Definitely C
upvoted 2 times
...
caohieu04
3 years, 8 months ago
Selected Answer: C
Community vote
upvoted 2 times
...
xiaoF
3 years, 9 months ago
should be C
upvoted 2 times
...
mousseUwU
4 years ago
C - https://cloud.google.com/architecture/recommendations-using-machine-learning-on-compute-engine#filtering_the_data
upvoted 4 times
...

Topic 1 Question 25

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 25 discussion

You work for a social media company. You need to detect whether posted images contain cars. Each training example is a member of exactly one class. You have trained an object detection neural network and deployed the model version to AI Platform Prediction for evaluation. Before deployment, you created an evaluation job and attached it to the AI Platform Prediction model version. You notice that the precision is lower than your business requirements allow. How should you adjust the model's final layer softmax threshold to increase precision?

  • A. Increase the recall.
  • B. Decrease the recall.
  • C. Increase the number of false positives.
  • D. Decrease the number of false negatives.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Paul_Dirac
Highly Voted 3 years, 10 months ago
Decreasing FN increases recall (D). So D and A are the same. Increasing FP decreases precision (C). Answer: B ("improving precision typically reduces recall and vice versa", https://developers.google.com/machine-learning/crash-course/classification/precision-and-recall)
upvoted 34 times
Swagluke
3 years, 8 months ago
I do believe B is the right answer. But D and A aren't exactly the same. A. Increase recall can be either 1. keeping TP + FN the same but increase TP and decrease FN. Which isn't sure how that's gonna affect Precision since both TP and TP+FP increase. 2. keeping TP the same but increase (TP + FN), which is increasing FN (Same as D), not sure how that will affect Precision as well.
upvoted 4 times
...
...
Danny2021
Highly Voted 3 years, 8 months ago
Precision = TruePositives / (TruePositives + FalsePositives) Recall = TruePositives / (TruePositives + FalseNegatives) A. Increase recall -> will decrease precision B. Decrease recall -> will increase precision C. Increase the false positives -> will decrease precision D. Decrease the false negatives -> will increase recall, reduce precision The correct answer is B.
upvoted 24 times
...
PhilipKoku
Most Recent 11 months, 1 week ago
Selected Answer: B
B) Decrease Recall (increases precision)
upvoted 2 times
...
SamuelTsch
1 year, 10 months ago
Selected Answer: B
To increase precision, you have to decrese recall, increse true positives, increse false negatives and decrease false positives
upvoted 2 times
...
M25
2 years ago
Selected Answer: B
Went with B
upvoted 3 times
...
Fatiy
2 years, 2 months ago
Selected Answer: B
Option B is the best approach because decreasing the threshold will increase the precision by reducing the number of false positives.
upvoted 1 times
...
John_Pongthorn
2 years, 4 months ago
Selected Answer: B
A , C , D they are the same. So I go with B , it is threshold adjustment from 0.5 +-
upvoted 1 times
John_Pongthorn
2 years, 4 months ago
WE want to increase Precision, it is the same as decreasing recall. Both are opposed each other. https://developers.google.com/machine-learning/crash-course/classification/precision-and-recall
upvoted 1 times
...
...
wish0035
2 years, 4 months ago
ans: B. A: should decrease even more the precission. C: will decrease precision D: will increase recall (precision would be the same)
upvoted 1 times
...
EFIGO
2 years, 5 months ago
Selected Answer: B
Precision and recall are negatively correlated, when one goes up the other goes down and vice-versa; to increase precidion we need to decrease recall, therefore answer B. (To be more complete, answer C and D are wrong because they both would increase recall, according to the recall formula)
upvoted 2 times
...
GCP72
2 years, 8 months ago
Selected Answer: C
Correct answer is "C"
upvoted 1 times
GCP72
2 years, 8 months ago
sorry correct ans is " B"
upvoted 1 times
...
...
originalliang
2 years, 9 months ago
Answer is D If the dataset does not change, TP + FN is constant. FN goes down then TP goes up. Hence Precision = TP / TP + FP goes up.
upvoted 2 times
...
Mohamed_Mossad
2 years, 11 months ago
Selected Answer: B
precision and recall have negative proportion , so to increase precision reduce recall
upvoted 1 times
...
morgan62
3 years, 1 month ago
Selected Answer: B
It's B. C,D is basically ruining your model.
upvoted 1 times
...
sonxxx
3 years, 2 months ago
Answer: D Because of Precision should respond the answer how many retrieved items are relevant? In the relation of False Negative / true positives an optima precision need a high number of true positives. If your model is precision is lower than your business requirement is because the model has a high number of false negatives. Check it in: https://en.wikipedia.org/wiki/Precision_and_recall
upvoted 2 times
...
xiaoF
3 years, 3 months ago
Selected Answer: B
definitely B
upvoted 1 times
...
Sangy22
3 years, 4 months ago
I think this should be C. The reason is, for one to increase precision, the classification threshold for whether the car is there or not should be kept low. That way, even when the model is not very confident (say only 60% confident), it will say, yes, car is there. What this does is it will crease the times the model says car is present, driving up precision (when it says car is there, car is really there). The consequence of this is, False positives will increase too, reducing recall. So C is my choice. Choices A and B are not really right, as precision and recall are after-effects, not something you will control ahead.
upvoted 1 times
...
Bemnet
3 years, 5 months ago
Answer is B . 100% sure . The only way to affect precision and recall is by adjusting threshold. FN and FP go in opposite direction so C & D are the same. A increasing recall decreases precision .
upvoted 3 times
...

Topic 1 Question 26

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 26 discussion

You are responsible for building a unified analytics environment across a variety of on-premises data marts. Your company is experiencing data quality and security challenges when integrating data across the servers, caused by the use of a wide range of disconnected tools and temporary solutions. You need a fully managed, cloud-native data integration service that will lower the total cost of work and reduce repetitive work. Some members on your team prefer a codeless interface for building Extract, Transform, Load (ETL) process. Which service should you use?

  • A. Dataflow
  • B. Dataprep
  • C. Apache Flink
  • D. Cloud Data Fusion
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
PhilipKoku
11 months, 1 week ago
Selected Answer: D
D) Cloud Data Function
upvoted 1 times
...
pinimichele01
1 year ago
Selected Answer: D
codeless interface -> D
upvoted 2 times
...
Sum_Sum
1 year, 5 months ago
Selected Answer: D
D is correct
upvoted 1 times
...
SamuelTsch
1 year, 10 months ago
Selected Answer: D
I think D is correct.
upvoted 1 times
...
M25
2 years ago
Selected Answer: D
Went with D
upvoted 1 times
...
FDS1993
2 years, 2 months ago
Selected Answer: B
Answer is B
upvoted 1 times
...
Fatiy
2 years, 2 months ago
Selected Answer: D
Cloud Data Fusion is a fully managed, cloud-native data integration service provided by Google Cloud Platform. It is designed to simplify the process of building and managing ETL pipelines across a variety of data sources and targets.
upvoted 3 times
OpenKnowledge
2 months, 1 week ago
Google Cloud Data Fusion is a codeless data integration service, allowing users to build and manage data pipelines using a visual, drag-and-drop interface with pre-built connectors and transformations instead of writing code.
upvoted 1 times
...
...
EFIGO
2 years, 5 months ago
Selected Answer: D
"codeless interface" ==> Data Fusion
upvoted 3 times
...
GCP72
2 years, 8 months ago
Selected Answer: D
Correct answer is "D"
upvoted 1 times
...
capt2101akash
2 years, 9 months ago
Selected Answer: D
D is correct as it is codeless
upvoted 1 times
...
Mohamed_Mossad
2 years, 11 months ago
Selected Answer: D
https://cloud.google.com/data-fusion/docs/concepts/overview#using_the_code-free_web_ui
upvoted 1 times
...
morgan62
3 years, 1 month ago
Selected Answer: D
D without any doubt
upvoted 2 times
...
xiaoF
3 years, 3 months ago
D. Datafusion is more designed for data ingestion from one source to another one, with few transformation. Dataprep is more designed for data preparation (as its name means), data cleaning, new column creation, splitting column. Dataprep also provide insight of the data for helping you in your recipes.
upvoted 3 times
...
majejim435
3 years, 6 months ago
D. Dataprep would also work but Data Fusion is better suited. (See https://stackoverflow.com/questions/58175386/can-google-data-fusion-make-the-same-data-cleaning-than-dataprep)
upvoted 2 times
...
mousseUwU
3 years, 6 months ago
D is correct Visual point-and-click interface enabling code-free deployment of ETL/ELT data pipelines and Operate high-volumes of data pipelines periodically source: https://cloud.google.com/data-fusion#all-features
upvoted 4 times
...
raintree
3 years, 8 months ago
B. Dataprep makes use of Apache beam, which can process streaming and batch, and thus prevent training-serving skew.
upvoted 2 times
...

Topic 1 Question 27

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 27 discussion

You are an ML engineer at a regulated insurance company. You are asked to develop an insurance approval model that accepts or rejects insurance applications from potential customers. What factors should you consider before building the model?

  • A. Redaction, reproducibility, and explainability
  • B. Traceability, reproducibility, and explainability
  • C. Federated learning, reproducibility, and explainability
  • D. Differential privacy, federated learning, and explainability
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
gcp2021go
Highly Voted 4 years, 3 months ago
I think the answer should be B. as I review the OECD document on impact of AI on insurance, the document mention explainability, traceable. However, open for discussion. https://www.oecd.org/finance/Impact-Big-Data-AI-in-the-Insurance-Sector.pdf
upvoted 33 times
...
salsabilsf
Highly Voted 4 years, 5 months ago
Should be B
upvoted 13 times
DucLee3110
4 years, 4 months ago
I think it should be A, as it is regulated, so need to have PII
upvoted 3 times
...
...
Yashd2012
Most Recent 1 month, 2 weeks ago
Selected Answer: B
Correct Answer: B. Traceability, reproducibility, and explainability Because regulators require: Traceability → auditors need to know what data, model, and version made each decision. Reproducibility → you must reproduce results if challenged in court/regulatory inquiry. Explainability → customers & regulators must understand why an application was accepted/rejected (to avoid bias/discrimination).
upvoted 3 times
...
Rajashekharc
1 year, 2 months ago
As per ChatGPT, anwser is B
upvoted 1 times
...
dija123
1 year, 4 months ago
Selected Answer: B
Agree with B
upvoted 1 times
...
PhilipKoku
1 year, 5 months ago
Selected Answer: B
B) Traceability, Reproducibility and Explainability
upvoted 1 times
...
Goosemoose
1 year, 5 months ago
A kinda makes sense here, because redaction means remove sensitive or private information before sharing it..
upvoted 2 times
...
gscharly
1 year, 6 months ago
Selected Answer: B
went with B
upvoted 2 times
...
Sum_Sum
1 year, 12 months ago
Selected Answer: B
B. Traceability, reproducibility, and explainability. Traceability: This involves maintaining records of the data, decisions, and processes used in the model. This is crucial in regulated industries for audit purposes and to ensure compliance with regulatory standards. It helps in understanding how the model was developed and how it makes decisions. Reproducibility: Ensuring that the results of the model can be reproduced using the same data and methods is vital for validating the model's reliability and for future development or debugging. Explainability: Given the significant impact of the model’s decisions on individuals' lives, it's crucial that the model's decisions can be explained in understandable terms. This is not just a best practice in AI ethics; in many jurisdictions, it's a legal requirement under regulations that mandate transparency in automated decision-making.
upvoted 5 times
nmnm22
1 year, 5 months ago
you are a lifesaver, sum sum. thank you
upvoted 1 times
...
...
tavva_prudhvi
2 years, 4 months ago
Selected Answer: B
B. Traceability, reproducibility, and explainability are the most important factors to consider before building an insurance approval model. Traceability ensures that the data used in the model is reliable and can be traced back to its source. Reproducibility ensures that the model can be replicated and tested to ensure its accuracy and fairness. Explainability ensures that the model's decisions can be explained to customers and regulators in a transparent manner. These factors are crucial for building a trustworthy and compliant model for an insurance company. Redaction is also important for protecting sensitive customer information, but it is not as critical as the other factors listed. Federated learning and differential privacy are techniques used to protect data privacy, but they are not necessarily required for building an insurance approval model.
upvoted 4 times
...
M25
2 years, 6 months ago
Selected Answer: B
Went with B
upvoted 1 times
...
shankalman717
2 years, 8 months ago
Selected Answer: B
B. Traceability, reproducibility, and explainability When developing an insurance approval model, it's crucial to consider several factors to ensure that the model is fair, accurate, and compliant with regulations. The factors to consider include: Traceability: It's important to be able to trace the data used to build the model and the decisions made by the model. This is important for transparency and accountability. Reproducibility: The model should be built in a way that allows for its reproducibility. This means that other researchers should be able to reproduce the same results using the same data and methods. Explainability: The model should be able to provide clear and understandable explanations for its decisions. This is important for building trust with customers and ensuring compliance with regulations. Other factors that may also be important to consider, depending on the specific context of the insurance company and its customers, include data privacy and security, fairness, and bias mitigation.
upvoted 4 times
...
shankalman717
2 years, 8 months ago
B. Traceability, reproducibility, and explainability When developing an insurance approval model, it's crucial to consider several factors to ensure that the model is fair, accurate, and compliant with regulations. The factors to consider include: Traceability: It's important to be able to trace the data used to build the model and the decisions made by the model. This is important for transparency and accountability. Reproducibility: The model should be built in a way that allows for its reproducibility. This means that other researchers should be able to reproduce the same results using the same data and methods. Explainability: The model should be able to provide clear and understandable explanations for its decisions. This is important for building trust with customers and ensuring compliance with regulations. Other factors that may also be important to consider, depending on the specific context of the insurance company and its customers, include data privacy and security, fairness, and bias mitigation.
upvoted 1 times
...
ares81
2 years, 10 months ago
Selected Answer: D
Checking Google documents, it seems D.
upvoted 2 times
tavva_prudhvi
2 years, 8 months ago
Please mention the links
upvoted 2 times
...
...
wish0035
2 years, 11 months ago
Selected Answer: B
ans: B
upvoted 1 times
...
GCP72
3 years, 2 months ago
Selected Answer: B
Correct answer is "B"
upvoted 1 times
...
capt2101akash
3 years, 3 months ago
Selected Answer: D
should be D as all of the techniques abide to any problems related to insurance
upvoted 3 times
...

Topic 1 Question 28

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 28 discussion

You are training a Resnet model on AI Platform using TPUs to visually categorize types of defects in automobile engines. You capture the training profile using the
Cloud TPU profiler plugin and observe that it is highly input-bound. You want to reduce the bottleneck and speed up your model training process. Which modifications should you make to the tf.data dataset? (Choose two.)

  • A. Use the interleave option for reading data.
  • B. Reduce the value of the repeat parameter.
  • C. Increase the buffer size for the shuttle option.
  • D. Set the prefetch option equal to the training batch size.
  • E. Decrease the batch size argument in your transformation.
Show Suggested Answer Hide Answer
Suggested Answer: AD 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
ralf_cc
Highly Voted 3 years, 10 months ago
AD - please weigh in guys
upvoted 40 times
...
danielp14021990
Highly Voted 3 years, 6 months ago
A. Use the interleave option for reading data. - Yes, that helps to parallelize data reading. B. Reduce the value of the repeat parameter. - No, this is only to repeat rows of the dataset. C. Increase the buffer size for the shuttle option. - No, there is only a shuttle option. D. Set the prefetch option equal to the training batch size. - Yes, this will pre-load the data. E. Decrease the batch size argument in your transformation. - No, could be even slower due to more I/Os. https://www.tensorflow.org/guide/data_performance
upvoted 27 times
...
PhilipKoku
Most Recent 11 months, 1 week ago
Selected Answer: AD
A) and D) are the right answers!
upvoted 1 times
...
harithacML
1 year, 10 months ago
Selected Answer: AD
A and D : https://www.tensorflow.org/guide/data_performance , interleave and prefetch
upvoted 2 times
...
M25
2 years ago
Selected Answer: AD
Went with A & D
upvoted 2 times
...
MithunDesai
2 years, 4 months ago
Selected Answer: AD
yes AD
upvoted 1 times
...
OJ42
2 years, 8 months ago
Selected Answer: AD
Yes AD
upvoted 1 times
...
GCP72
2 years, 8 months ago
Selected Answer: AD
YES.....AD - agree with danielp1
upvoted 1 times
...
u_phoria
2 years, 9 months ago
Selected Answer: AD
AD - agree with danielp1 By the way, this is handy to understand the significance of shuffle buffer_size: https://stackoverflow.com/a/48096625/1933315
upvoted 2 times
...
onku
2 years, 10 months ago
Selected Answer: DE
I think D & E are correct.
upvoted 1 times
...
Xrobat
2 years, 10 months ago
AD should be the right answer.
upvoted 3 times
...
eddy1234567890
2 years, 10 months ago
Answers?
upvoted 1 times
...
93alejandrosanchez
3 years, 6 months ago
For me it should be D and E as well. Prefetching will help reading data while training is performed, which helps with the bottleneck, D is for sure right. I think decreasing batch size would help too, because less records will be read in each training step (reading a lot of records would lead to the bottleneck described, as reading data is costly). I'm not 100% sure on A, personally I don't think processing many input files concurrently would help in this case because the reading operation is precisely the problem. However, I'm no expert in this topic so I might be wrong.
upvoted 2 times
klemiec
3 years, 2 months ago
D is not correct answer. Instead of decrising batch size, incrising may help. (https://cloud.google.com/tpu/docs/performance-guide - "TPU model performance" section)
upvoted 1 times
Goosemoose
11 months, 1 week ago
you mean E, not D, right?
upvoted 1 times
...
...
...
gcp2021go
3 years, 9 months ago
I think it should be DE. I found this article https://towardsdatascience.com/overcoming-data-preprocessing-bottlenecks-with-tensorflow-data-service-nvidia-dali-and-other-d6321917f851
upvoted 3 times
...

Topic 1 Question 29

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 29 discussion

You have trained a model on a dataset that required computationally expensive preprocessing operations. You need to execute the same preprocessing at prediction time. You deployed the model on AI Platform for high-throughput online prediction. Which architecture should you use?

  • A. Validate the accuracy of the model that you trained on preprocessed data. Create a new model that uses the raw data and is available in real time. Deploy the new model onto AI Platform for online prediction.
  • B. Send incoming prediction requests to a Pub/Sub topic. Transform the incoming data using a Dataflow job. Submit a prediction request to AI Platform using the transformed data. Write the predictions to an outbound Pub/Sub queue.
  • C. Stream incoming prediction request data into Cloud Spanner. Create a view to abstract your preprocessing logic. Query the view every second for new records. Submit a prediction request to AI Platform using the transformed data. Write the predictions to an outbound Pub/Sub queue.
  • D. Send incoming prediction requests to a Pub/Sub topic. Set up a Cloud Function that is triggered when messages are published to the Pub/Sub topic. Implement your preprocessing logic in the Cloud Function. Submit a prediction request to AI Platform using the transformed data. Write the predictions to an outbound Pub/Sub queue.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
SparkExpedition
Highly Voted 4 years, 4 months ago
Supporting B ..https://cloud.google.com/architecture/data-preprocessing-for-ml-with-tf-transform-pt1#where_to_do_preprocessing
upvoted 32 times
...
inder0007
Highly Voted 4 years, 5 months ago
I think it should b B
upvoted 14 times
q4exam
4 years, 1 month ago
I also agree with B, this is how I would advise clients to do it as well
upvoted 4 times
...
...
06a6df9
Most Recent 3 months, 3 weeks ago
Selected Answer: D
I think the "high-throughput" is the critical differentiator between choosing B or D. Dataflow is designed for large-scale stream or batch processing, not for low-latency, single-request online prediction. The startup latency of a Dataflow job makes it unsuitable for this pattern.
upvoted 1 times
Fer660
2 months, 3 weeks ago
I would argue that the dataflow job is 'always on' in streaming mode for this use case, whereas the cloud function would incur startup latency. So I chose B rather than D.
upvoted 1 times
...
...
IrribarraC
8 months, 3 weeks ago
Selected Answer: B
Dataflow has autoscale. And in my experience, you use Cloud Functions to small stuff.
upvoted 3 times
...
ship123
10 months, 2 weeks ago
Selected Answer: D
You are an ML engineer who has trained a model on a dataset that required computationally expensive preprocessing operations. You need to execute the same preprocessing at prediction time. You deployed the model on the Vertex AI platform for high‐throughput online prediction. Which architecture should you use? Answer is . Send incoming prediction requests to a Pub/Sub topic. Set up a Cloud Function that is triggered when messages are published to the Pub/Sub topic. Implement your preprocessing logic in the Cloud Function. Submit a prediction request to the Vertex AI platform using the transformed data. Write the predictions to an outbound Pub/Sub queue.
upvoted 1 times
...
rajshiv
11 months, 1 week ago
Selected Answer: D
B is incorrect. Dataflow is a great option for large-scale data processing but may introduce additional complexity and overhead for a real-time prediction scenario where you just need to preprocess data on-the-fly. This is more appropriate for batch processing or when large volumes of data need to be processed in parallel. Option D is better as it leverages Pub/Sub, Cloud Functions, and AI Platform to preprocess data and obtain predictions without needing complex infrastructure or additional systems like Dataflow or Cloud Spanner.
upvoted 1 times
...
f084277
12 months ago
Selected Answer: B
Dataflow is superior to Cloud Functions for doing data transformations at high volume. The answer is clearly B.
upvoted 2 times
...
bludw
1 year, 4 months ago
Selected Answer: D
D. The issue with B is that DataFlow does not work well with high throughput
upvoted 1 times
desertlotus1211
1 year ago
Dataflow is ideal for handling computationally expensive preprocessing operations, as it scales automatically and can process the data in a distributed manner.
upvoted 1 times
...
f084277
12 months ago
You are incorrect. Dataflow can handle MUCH higher volumes of data than Cloud Functions
upvoted 1 times
...
...
PhilipKoku
1 year, 5 months ago
Selected Answer: B
B) Pub/Sub + Dataflow
upvoted 1 times
...
Liting
2 years, 4 months ago
Selected Answer: B
Went with B, using dataflow for large amount data transformation is the best option
upvoted 3 times
...
SamuelTsch
2 years, 4 months ago
Selected Answer: B
I went to B. A is completely wrong. C: 1st cloud spanner is not designed for high throughput, also it is not for preprocessing. D: cloud function could not be get enough resource to do the high computational transformation.
upvoted 2 times
...
ashu381
2 years, 5 months ago
Selected Answer: B
Because the concern here is high throughput and not specifically the latency so better to go with option B
upvoted 1 times
...
Voyager2
2 years, 5 months ago
Selected Answer: D
B. Send incoming prediction requests to a Pub/Sub topic. Transform the incoming data using a Dataflow job. Submit a prediction request to AI Platform using the transformed data. Write the predictions to an outbound Pub/Sub queue https://dataintegration.info/building-streaming-data-pipelines-on-google-cloud
upvoted 1 times
...
M25
2 years, 6 months ago
Selected Answer: B
Went with B
upvoted 1 times
...
e707
2 years, 6 months ago
Selected Answer: D
I think it's D as B is not a good choice because it requires you to run a Dataflow job for each prediction request. This is inefficient and can lead to latency issues.
upvoted 3 times
lucaluca1982
2 years, 6 months ago
Yes i agree Dataflow can introduce latency
upvoted 2 times
...
f084277
12 months ago
The question doesn't mention anything about latency
upvoted 1 times
...
...
lucaluca1982
2 years, 7 months ago
Selected Answer: D
I go for D. Option B has Dataflow that it is more suitable for batch
upvoted 1 times
...
SergioRubiano
2 years, 7 months ago
Selected Answer: B
It's B
upvoted 1 times
...

Topic 1 Question 30

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 30 discussion

Your team trained and tested a DNN regression model with good results. Six months after deployment, the model is performing poorly due to a change in the distribution of the input data. How should you address the input differences in production?

  • A. Create alerts to monitor for skew, and retrain the model.
  • B. Perform feature selection on the model, and retrain the model with fewer features.
  • C. Retrain the model, and select an L2 regularization parameter with a hyperparameter tuning service.
  • D. Perform feature selection on the model, and retrain the model on a monthly basis with fewer features.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
celia20200410
Highly Voted 3 years, 9 months ago
A Data values skews: These skews are significant changes in the statistical properties of data, which means that data patterns are changing, and you need to trigger a retraining of the model to capture these changes. https://developers.google.com/machine-learning/guides/rules-of-ml/#rule_37_measure_trainingserving_skew
upvoted 36 times
oliveolil
3 years, 5 months ago
Rule #37: The difference between the performance on the holdout data and the "next­day" data. Again, this will always exist. You should tune your regularization to maximize the next-day performance. However, large drops in performance between holdout and next-day data may indicate that some features are time-sensitive and possibly degrading model performance. Maybe it should be C
upvoted 2 times
...
mousseUwU
3 years, 6 months ago
I agree, A is correct
upvoted 2 times
...
...
Paul_Dirac
Highly Voted 3 years, 10 months ago
A Data drift doesn't necessarily require feature reselection (e.g. by L2 regularization). https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning#challenges
upvoted 5 times
...
b7ad1d9
Most Recent 1 month, 3 weeks ago
Selected Answer: A
A : It is the simplest answer. The others seem to be trying to solve feature selection again but the problem is not feature selection but input data drift. Rethinking features is not needed, simple monitoring and readjusting weights is needed.
upvoted 1 times
...
PhilipKoku
11 months, 1 week ago
Selected Answer: A
A) Monitor the model and set alerts
upvoted 1 times
...
tavva_prudhvi
1 year, 10 months ago
Selected Answer: A
When the distribution of input data changes, the model may not perform as well as it did during training. It is important to monitor the performance of the model in production and identify any changes in the distribution of input data. By creating alerts to monitor for skew, you can detect when the input data distribution has changed and take action to retrain the model using more recent data that reflects the new distribution. This will help ensure that the model continues to perform well in production.
upvoted 2 times
...
M25
2 years ago
Selected Answer: A
Went with A
upvoted 2 times
...
SergioRubiano
2 years, 1 month ago
Selected Answer: A
A is correct
upvoted 1 times
...
tavva_prudhvi
2 years, 2 months ago
Its A, as the model itself is performing well, neither overfitting nor performing poorly suddenly, it's a gradual change so regularization on the original model would not help. C is incorrect.
upvoted 2 times
...
Fatiy
2 years, 2 months ago
Selected Answer: A
Creating alerts to monitor for skew in the input data can help to detect when the distribution of the data has changed and the model's performance is affected. Once a skew is detected, retraining the model with the new data can improve its performance.
upvoted 1 times
...
enghabeth
2 years, 3 months ago
Selected Answer: A
Skew & drift monitoring: Production data tends to constantly change in different dimensions (i.e. time and system wise). And this causes the performance of the model to drop. https://cloud.google.com/vertex-ai/docs/model-monitoring/using-model-monitoring
upvoted 1 times
...
hiromi
2 years, 5 months ago
Selected Answer: A
A You don't need to do feature selection again
upvoted 2 times
...
Mohamed_Mossad
2 years, 10 months ago
Selected Answer: A
A very obvious , no need for explanation
upvoted 1 times
...
Mohamed_Mossad
2 years, 11 months ago
Selected Answer: A
abviously A no tricks here , no too much thinking
upvoted 1 times
...
ggorzki
3 years, 3 months ago
Selected Answer: A
A as celia explained
upvoted 1 times
...
kaike_reis
3 years, 6 months ago
Colleagues that said (C) keep attention for the question: They said the model was good, so for skewness is only necessary the (A) solution.
upvoted 1 times
...
Danny2021
3 years, 8 months ago
A. It is well documented in Google model monitoring docs.
upvoted 2 times
...
gcp2021go
3 years, 9 months ago
should be C. as L2 regularization prevent overfitting - can potential maintain model performance if data distribution is little skewed.
upvoted 2 times
...

Topic 1 Question 31

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 31 discussion

You need to train a computer vision model that predicts the type of government ID present in a given image using a GPU-powered virtual machine on Compute
Engine. You use the following parameters:
✑ Optimizer: SGD
✑ Image shape = 224ֳ—224
✑ Batch size = 64
✑ Epochs = 10
✑ Verbose =2
During training you encounter the following error: ResourceExhaustedError: Out Of Memory (OOM) when allocating tensor. What should you do?

  • A. Change the optimizer.
  • B. Reduce the batch size.
  • C. Change the learning rate.
  • D. Reduce the image shape.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
maartenalexander
Highly Voted 3 years, 10 months ago
B. I think you want to reduce batch size. Learning rate and optimizer shouldn't really impact memory utilisation. Decreasing image size (A) would work, but might be costly in terms final performance
upvoted 26 times
...
guruguru
Highly Voted 3 years, 9 months ago
B. https://stackoverflow.com/questions/59394947/how-to-fix-resourceexhaustederror-oom-when-allocating-tensor/59395251#:~:text=OOM%20stands%20for%20%22out%20of,in%20your%20Dense%20%2C%20Conv2D%20layers
upvoted 9 times
...
PhilipKoku
Most Recent 11 months, 1 week ago
Selected Answer: B
B) Reduce the batch size.
upvoted 1 times
...
SamuelTsch
1 year, 10 months ago
Selected Answer: B
no doubt went to B
upvoted 1 times
...
M25
2 years ago
Selected Answer: B
Went with B
upvoted 2 times
...
SergioRubiano
2 years, 1 month ago
Selected Answer: B
B is correct
upvoted 1 times
...
Fatiy
2 years, 2 months ago
Selected Answer: B
By reducing the batch size, the amount of memory required for each iteration of the training process is reduced
upvoted 1 times
...
Fatiy
2 years, 2 months ago
Selected Answer: A
Creating alerts to monitor for skew in the input data can help to detect when the distribution of the data has changed and the model's performance is affected. Once a skew is detected, retraining the model with the new data can improve its performance.
upvoted 1 times
Fatiy
2 years, 2 months ago
Sorry it's not the response for this question. it's the response for the previous question.
upvoted 1 times
...
...
John_Pongthorn
2 years, 2 months ago
Selected Answer: B
Reduce the image shape != Reduce the image Size.
upvoted 1 times
...
seifou
2 years, 5 months ago
The answer is B Since you are using an SGD, you can use a batch size of 1 ref: https://stackoverflow.com/questions/63139072/batch-size-for-stochastic-gradient-descent-is-length-of-training-data-and-not-1
upvoted 2 times
...
Mohamed_Mossad
2 years, 11 months ago
Selected Answer: B
to fix memory overflow you need to reduce batch size also reduce input resolution is valid but reducing image size can harm model performance , so answer is B
upvoted 3 times
...
alphard
3 years, 5 months ago
B is my option. But, D seems not wrong. Reducing batch size or reducing image size bot can reduce memory usage. But, the former seems much easier.
upvoted 2 times
...
kaike_reis
3 years, 6 months ago
B is correct. Letter D can be used, as we reduced the image size but this will directly impact the model's performance. Another point is that when doing this, if you are using a model via Keras's `Functional API` you need to change the definition of the input and also apply pre-processing on the image to reduce its size . In other words: much more work than the letter B.
upvoted 3 times
...
mousseUwU
3 years, 6 months ago
B is correct, it uses less memory. A works too but depending on what you need you will loose perfomance (just like maartenalexander said) so I think it is not recommended.
upvoted 3 times
...
george_ognyanov
3 years, 7 months ago
Initially, I though D. ,decreasing image size, would be the correct one, but now that I am reviewing the test I think maartenalexander is correct in saying reduced image size might decrease final performance, so I'd go with B eventually.
upvoted 2 times
...

Topic 1 Question 32

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 32 discussion

You developed an ML model with AI Platform, and you want to move it to production. You serve a few thousand queries per second and are experiencing latency issues. Incoming requests are served by a load balancer that distributes them across multiple Kubeflow CPU-only pods running on Google Kubernetes Engine
(GKE). Your goal is to improve the serving latency without changing the underlying infrastructure. What should you do?

  • A. Significantly increase the max_batch_size TensorFlow Serving parameter.
  • B. Switch to the tensorflow-model-server-universal version of TensorFlow Serving.
  • C. Significantly increase the max_enqueued_batches TensorFlow Serving parameter.
  • D. Recompile TensorFlow Serving using the source to support CPU-specific optimizations. Instruct GKE to choose an appropriate baseline minimum CPU platform for serving nodes.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Y2Data
Highly Voted 4 years, 1 month ago
D is correct since this question is focusing on server performance which development env is higher than production env. It's already throttling so increase the pressure on them won't help. Both A and C is essentially doing this. B is a bit mysterious, but we definitely know that D would work.
upvoted 32 times
mousseUwU
4 years ago
I think it's D too
upvoted 3 times
...
...
pico
Highly Voted 1 year, 12 months ago
Selected Answer: C
https://github.com/tensorflow/serving/blob/master/tensorflow_serving/batching/README.md#batch-scheduling-parameters-and-tuning A may help to some extent, but it primarily affects how many requests are processed in a single batch. It might not directly address latency issues. D is a valid approach for optimizing TensorFlow Serving for CPU-specific optimizations, but it's a more involved process and might not be the quickest way to address latency issues.
upvoted 5 times
...
desertlotus1211
Most Recent 10 months, 2 weeks ago
Selected Answer: D
A is wrong - Increasing max_batch_size reduces latency by batching more requests together, but this introduces delays since the system must wait to accumulate a full batch. - This approach can improve throughput but may increase per-query latency, which contradicts the goal of reducing latency.
upvoted 2 times
Fer660
2 months, 3 weeks ago
However, we are getting several 1000 QPS, so we should be filling a batch pretty quickly, and A will be ok. I think that D will change the architecture, which is explicitly not desired.
upvoted 1 times
...
...
rajshiv
11 months, 1 week ago
Selected Answer: A
I do not think D is correct as D is focused on optimizing CPU utilization, not on the batching process or managing latency. Since our goal is to improve serving latency, optimizing batching via the max_batch_size parameter is a more straightforward and effective solution.
upvoted 2 times
...
AB_C
11 months, 2 weeks ago
Selected Answer: A
A would work
upvoted 1 times
...
desertlotus1211
1 year ago
max_batch_size: Increasing the max_batch_size parameter allows TensorFlow Serving to process more requests in a single batch. This can improve throughput and reduce latency, especially in high-query environments, as it allows more efficient utilization of CPU resources by processing larger batches of requests at once. Answer A
upvoted 3 times
...
taksan
1 year, 2 months ago
Selected Answer: D
I think the correct is D, because the question is about reducing latency. As for A, increasing the batch size might event hurt latency if the system is overwhelmed to serve more multiple requests
upvoted 2 times
...
chirag2506
1 year, 4 months ago
Selected Answer: D
it is D
upvoted 2 times
...
PhilipKoku
1 year, 5 months ago
Selected Answer: C
C) Batch enqueued
upvoted 1 times
...
pinimichele01
1 year, 7 months ago
Selected Answer: D
increasing the max_batch_size TensorFlow Serving parameter, is not the best choice because increasing the batch size may not necessarily improve latency. In fact, it may even lead to higher latency for individual requests, as they will have to wait for the batch to be filled before processing. This may be useful when optimizing for throughput, but not for serving latency, which is the primary goal in this scenario.
upvoted 2 times
...
ichbinnoah
1 year, 12 months ago
Selected Answer: A
I think A is correct, as D implies changes to the infrastructure (question says you must not do that).
upvoted 2 times
edoo
1 year, 8 months ago
This is purely a software optimization and on how GKE handles requests. GKE should be able to choose different CPU types for nodes within the same cluster, which doesn't represent a change in architecture.
upvoted 1 times
...
...
tavva_prudhvi
2 years, 3 months ago
Selected Answer: D
increasing the max_batch_size TensorFlow Serving parameter, is not the best choice because increasing the batch size may not necessarily improve latency. In fact, it may even lead to higher latency for individual requests, as they will have to wait for the batch to be filled before processing. This may be useful when optimizing for throughput, but not for serving latency, which is the primary goal in this scenario.
upvoted 2 times
...
harithacML
2 years, 4 months ago
Selected Answer: D
max_batch_size parameter controls the maximum number of requests that can be batched together by TensorFlow Serving. Increasing this parameter can help reduce the number of round trips between the client and server, which can improve serving latency. However, increasing the batch size too much can lead to higher memory usage and longer processing times for each batch.
upvoted 2 times
...
Liting
2 years, 4 months ago
Selected Answer: D
Definetely D to improve the serving latency of an ML model on AI Platform, you can recompile TensorFlow Serving using the source to support CPU-specific optimizations and instruct GKE to choose an appropriate baseline minimum CPU platform for serving nodes, this way GKE will schedule the pods on nodes with at least that CPU platform.
upvoted 2 times
...
M25
2 years, 6 months ago
Selected Answer: D
Went with D
upvoted 2 times
...
SergioRubiano
2 years, 7 months ago
Selected Answer: A
A is correct. max_batch_size TensorFlow Serving parameter
upvoted 2 times
...
Yajnas_arpohc
2 years, 7 months ago
Selected Answer: A
CPU-only: One Approach If your system is CPU-only (no GPU), then consider starting with the following values: num_batch_threads equal to the number of CPU cores; max_batch_size to a really high value; batch_timeout_micros to 0. Then experiment with batch_timeout_micros values in the 1-10 millisecond (1000-10000 microsecond) range, while keeping in mind that 0 may be the optimal value. https://github.com/tensorflow/serving/tree/master/tensorflow_serving/batching
upvoted 3 times
frangm23
2 years, 6 months ago
In that very link, what it says is that max_batch_size is the parameter that governs the latency/troughput tradeoff, and as I understand, the higher the batch size, the higher the throughput, but that doesn't assure that latency will be lower. I would go with D
upvoted 4 times
...
...

Topic 1 Question 33

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 33 discussion

You have a demand forecasting pipeline in production that uses Dataflow to preprocess raw data prior to model training and prediction. During preprocessing, you employ Z-score normalization on data stored in BigQuery and write it back to BigQuery. New training data is added every week. You want to make the process more efficient by minimizing computation time and manual intervention. What should you do?

  • A. Normalize the data using Google Kubernetes Engine.
  • B. Translate the normalization algorithm into SQL for use with BigQuery.
  • C. Use the normalizer_fn argument in TensorFlow's Feature Column API.
  • D. Normalize the data with Apache Spark using the Dataproc connector for BigQuery.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
maartenalexander
Highly Voted 3 years, 10 months ago
B. I think. BiqQuery definitely minimizes computational time for normalization. I think it would also minimize manual intervention. For data normalization in dataflow you'd have to pass in values of mean and standard deviation as a side-input. That seems more work than a simple SQL query
upvoted 22 times
93alejandrosanchez
3 years, 6 months ago
I agree that B would definitely get the job done. But wouldn't D work as well and keep all the data pre-processing in Dataflow?
upvoted 2 times
kaike_reis
3 years, 6 months ago
Dataflow uses Beam, different from dataproc that uses Spark. I think that D would be wrong because we would add one more service into the pipeline for a simple transformation (minus the mean and divide by std).
upvoted 4 times
...
...
...
OpenKnowledge
Most Recent 2 months, 1 week ago
Selected Answer: B
In BigQuery, z-score normalization can be achieved using the ML.STANDARD_SCALER function
upvoted 1 times
...
PhilipKoku
11 months, 1 week ago
Selected Answer: B
B) Using BigQuery
upvoted 1 times
...
Sum_Sum
1 year, 5 months ago
Selected Answer: B
z-scores is very easy to do in BQ - no need for more complex solutions
upvoted 2 times
...
elenamatay
1 year, 8 months ago
B. All that maartenalexander said, + BigQuery already has a function for that: https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-standard-scaler , we could even schedule the query for calculating this automatically :)
upvoted 3 times
...
aaggii
1 year, 10 months ago
Selected Answer: C
Every week when new data is loaded mean and standard deviation is calculated for it and passed as parameter to calculate z score at serving https://towardsdatascience.com/how-to-normalize-features-in-tensorflow-5b7b0e3a4177
upvoted 1 times
tavva_prudhvi
1 year, 9 months ago
owever, in the given scenario, you are using Dataflow for preprocessing and BigQuery for storing data. To make the process more efficient by minimizing computation time and manual intervention, you should still opt for option B: Translate the normalization algorithm into SQL for use with BigQuery. This way, you can perform the normalization directly in BigQuery, which will save time and resources compared to using an external tool.
upvoted 1 times
...
...
SamuelTsch
1 year, 10 months ago
Selected Answer: B
A, D usually need additional configuration, which could cost much more time.
upvoted 1 times
...
M25
2 years ago
Selected Answer: B
Went with B
upvoted 2 times
...
SergioRubiano
2 years, 1 month ago
Selected Answer: B
Best way is B
upvoted 2 times
...
Fatiy
2 years, 2 months ago
Selected Answer: D
Option D is the best solution because Apache Spark provides a distributed computing platform that can handle large-scale data processing with ease. By using the Dataproc connector for BigQuery, Spark can read data directly from BigQuery and perform the normalization process in a distributed manner. This can significantly reduce computation time and manual intervention. Option A is not a good solution because Kubernetes is a container orchestration platform that does not directly provide data normalization capabilities. Option B is not a good solution because Z-score normalization is a data transformation technique that cannot be easily translated into SQL. Option C is not a good solution because the normalizer_fn argument in TensorFlow's Feature Column API is only applicable for feature normalization during model training, not for data preprocessing.
upvoted 2 times
...
ares81
2 years, 4 months ago
Selected Answer: B
Best way to proceed is B.
upvoted 2 times
Fatiy
2 years, 2 months ago
SQL is not as flexible as other programming languages like Python, which can limit the ability to customize the normalization process or incorporate new features in the future.
upvoted 1 times
...
...
Mohamed_Mossad
2 years, 11 months ago
Selected Answer: B
B is the most efficient as you will not load --> process --> save , no you will only write some sql in bigquery and voila :D
upvoted 4 times
...
baimus
3 years, 1 month ago
It's B, bigquery can do this internally, no need for dataflow
upvoted 2 times
Fatiy
2 years, 2 months ago
SQL is not as flexible as other programming languages like Python, which can limit the ability to customize the normalization process or incorporate new features in the future.
upvoted 1 times
...
...
xiaoF
3 years, 3 months ago
Selected Answer: B
I agree with B.
upvoted 2 times
...
alashin
3 years, 10 months ago
B. I agree with B as well.
upvoted 3 times
...

Topic 1 Question 34

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 34 discussion

You need to design a customized deep neural network in Keras that will predict customer purchases based on their purchase history. You want to explore model performance using multiple model architectures, store training data, and be able to compare the evaluation metrics in the same dashboard. What should you do?

  • A. Create multiple models using AutoML Tables.
  • B. Automate multiple training runs using Cloud Composer.
  • C. Run multiple training jobs on AI Platform with similar job names.
  • D. Create an experiment in Kubeflow Pipelines to organize multiple runs.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
ralf_cc
Highly Voted 3 years, 10 months ago
D - https://www.kubeflow.org/docs/about/use-cases/
upvoted 14 times
...
tavva_prudhvi
Highly Voted 1 year, 10 months ago
Selected Answer: D
The best approach is to create an experiment in Kubeflow Pipelines to organize multiple runs. Option A is incorrect because AutoML Tables is a managed machine learning service that automates the process of building machine learning models from tabular data. It does not provide the flexibility to customize the model architecture or explore multiple model architectures. Option B is incorrect because Cloud Composer is a managed workflow orchestration service that can be used to automate machine learning workflows. However, it does not provide the same level of flexibility or scalability as Kubeflow Pipelines. Option C is incorrect because running multiple training jobs on AI Platform with similar job names will not allow you to easily organize and compare the results.
upvoted 7 times
...
PhilipKoku
Most Recent 11 months, 1 week ago
Selected Answer: D
D) Experiments is the way forward
upvoted 1 times
...
tikka0804
1 year, 5 months ago
I would vote for D but if C had said instead "different job names" .. would that have been a better option?
upvoted 2 times
...
Sum_Sum
1 year, 5 months ago
Selected Answer: D
D - everything else is just nonsense
upvoted 1 times
...
SamuelTsch
1 year, 10 months ago
Selected Answer: D
D should be correct
upvoted 2 times
...
Liting
1 year, 10 months ago
Selected Answer: D
C has similar job name, which make it wrong So correct answer should be D
upvoted 1 times
...
M25
2 years ago
Selected Answer: D
Went with D
upvoted 1 times
...
Fatiy
2 years, 2 months ago
Selected Answer: D
With Kubeflow Pipelines, you can create experiments that help you keep track of multiple training runs with different model architectures and hyperparameters.
upvoted 1 times
...
mymy9418
2 years, 4 months ago
Selected Answer: C
https://cloud.google.com/vertex-ai/docs/experiments/user-journey/uj-compare-models
upvoted 2 times
...
suresh_vn
2 years, 8 months ago
D option C does not work since CAIP have updated to VertexAI
upvoted 1 times
...
Mohamed_Mossad
2 years, 10 months ago
Selected Answer: D
https://www.kubeflow.org/docs/components/pipelines/concepts/experiment/ https://www.kubeflow.org/docs/components/pipelines/concepts/run/
upvoted 1 times
...
mmona19
3 years ago
Selected Answer: D
D- we need to use experiments feature to comapre models,having different jobnames is not going to help track experiments.
upvoted 3 times
...
sid515
3 years, 3 months ago
C for me. It only talks about experimentation .. thats where AI platform fits better.
upvoted 2 times
...
NamitSehgal
3 years, 4 months ago
Selected Answer: C
Similar job names is a bit of a confusion creator as we can not use same job names for sure. D sounds better but better in vertex AI during experiment phase only.
upvoted 1 times
...
kfrd
3 years, 6 months ago
C anyone? D seems to me like an overkill.
upvoted 4 times
kaike_reis
3 years, 6 months ago
(C) presents the most specific solution for what the question asks for: experimenting with models with their due comparisons. All of this is possible with the AI Platform. Furthermore, the question only speaks of experimentation. Kubeflow would be more powerfull if was a necessity for end-to-end pipeline.
upvoted 3 times
...
...
Danny2021
3 years, 8 months ago
D. In the new Vertex AI, it now supports experimentation with hyper parameter tuning.
upvoted 4 times
tavva_prudhvi
1 year, 9 months ago
How can we track the progress of each run and compare the results in the vertex AI dashboard?
upvoted 1 times
...
...

Topic 1 Question 35

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 35 discussion

You are developing a Kubeflow pipeline on Google Kubernetes Engine. The first step in the pipeline is to issue a query against BigQuery. You plan to use the results of that query as the input to the next step in your pipeline. You want to achieve this in the easiest way possible. What should you do?

  • A. Use the BigQuery console to execute your query, and then save the query results into a new BigQuery table.
  • B. Write a Python script that uses the BigQuery API to execute queries against BigQuery. Execute this script as the first step in your Kubeflow pipeline.
  • C. Use the Kubeflow Pipelines domain-specific language to create a custom component that uses the Python BigQuery client library to execute queries.
  • D. Locate the Kubeflow Pipelines repository on GitHub. Find the BigQuery Query Component, copy that component's URL, and use it to load the component into your pipeline. Use the component to execute queries against BigQuery.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
maartenalexander
Highly Voted 4 years, 4 months ago
D. Kubeflow pipelines have different types of components, ranging from low- to high-level. They have a ComponentStore that allows you to access prebuilt functionality from GitHub.
upvoted 24 times
gcp2021go
4 years, 3 months ago
agree, links: https://github.com/kubeflow/pipelines/blob/master/components/gcp/bigquery/query/sample.ipynb; https://v0-5.kubeflow.org/docs/pipelines/reusable-components/
upvoted 6 times
...
...
NamitSehgal
Highly Voted 3 years, 10 months ago
Selected Answer: D
Not sure what is the reason behind putting A as it is manual and manual steps can not be part of automation. I would say Answer is D as it just require a clone of the component from github. Using a Python and import bigquery component may sounds good too, but ask was what is easiest. It depends how word "easy" is taken by individuals but definitely not A.
upvoted 7 times
...
b7ad1d9
Most Recent 1 month, 3 weeks ago
Selected Answer: D
D The recommended approach by Google Cloud for integrating BigQuery (BQ) data into a Kubeflow Pipeline is to use the Google Cloud Pipeline Components (GCPC), which include purpose-built components for BigQuery operations. This is also the most automated option
upvoted 1 times
...
taksan
1 year, 2 months ago
Selected Answer: D
D is the correct answer, as reusing an existing component is the most streamlined way to interact with BigQuery.
upvoted 2 times
...
nktyagi
1 year, 3 months ago
Selected Answer: B
much simpler to just write a couple of lines of python
upvoted 2 times
desertlotus1211
1 year ago
Writing a Python script using the BigQuery API is possible, but it's more complex than using an existing component. It requires more development effort and doesn't take advantage of the pre-built components available in Kubeflow.
upvoted 2 times
...
...
jsalvasoler
1 year, 3 months ago
Selected Answer: B
Clearly B
upvoted 1 times
...
PhilipKoku
1 year, 5 months ago
Selected Answer: B
B) Python API
upvoted 3 times
...
Amabo
1 year, 6 months ago
from kfp.components import load_component_from_url bigquery_query_op = load_component_from_url('https://raw.githubusercontent.com/kubeflow/pipelines/master/components/gcp/bigquery/query/component.yaml') def my_pipeline(): query_result = bigquery_query_op( project_id='my-project', query='SELECT * FROM my_dataset.my_table' ) # Use the query_result as input to the next step in the pipeline
upvoted 4 times
...
fragkris
1 year, 11 months ago
Selected Answer: B
Im going "against the flow" and chosing B. It just sounds a lot easier option than D.
upvoted 3 times
...
friedi
2 years, 4 months ago
Selected Answer: B
Very confused as to why D is the correct answer. To me it seems a) much simpler to just write a couple of lines of python (https://cloud.google.com/bigquery/docs/reference/libraries#client-libraries-install-python) and b) the documentation for the BigQuery reusable component (https://v0-5.kubeflow.org/docs/pipelines/reusable-components/) states that the data is written to Google Cloud Storage, which means we have to write the fetching logic in the next pipeline step, going against the "as simple as possible" requirement. Would be interested to hear why I am wrong.
upvoted 3 times
friedi
2 years, 4 months ago
Actually, the problem statement even says that the query result has to be used as input to the next step, meaning with answer D) we would have to download the results before passing them to the next step. Additionally, we would have to handle potentially existing files in Google Cloud Storage if the pipeline is either executed multiple times or even in parallel. (I will die on this hill 😆 ).
upvoted 2 times
...
tavva_prudhvi
2 years ago
Yup, you raised valid points. Depending on your specific requirements and familiarity with Python, writing a custom script using the BigQuery API (Option B) can be a simpler and more flexible approach. With Option B, you can write a Python script that uses the BigQuery API to execute queries against BigQuery and fetch the data directly into your pipeline. This way, you can process the data as needed and pass it to the next step in the pipeline without the need to fetch it from Google Cloud Storage. While using the reusable BigQuery Query Component (Option D) provides a pre-built solution, it does require additional steps to fetch the data from Google Cloud Storage for the next step in the pipeline, which might not be the simplest approach.
upvoted 2 times
...
...
M25
2 years, 6 months ago
Selected Answer: D
Went with D
upvoted 2 times
...
Mohamed_Mossad
3 years, 4 months ago
Selected Answer: D
https://linuxtut.com/en/f4771efee37658c083cc/
upvoted 2 times
Mohamed_Mossad
3 years, 4 months ago
answer between C,D but above link has an article which uses a ready .yml file for bigquery component on official kubeflow pipelines repo
upvoted 1 times
...
...
David_ml
3 years, 6 months ago
Selected Answer: D
Answer is D.
upvoted 3 times
...
donchoripan
3 years, 7 months ago
A. it says the easiest way possible so it sounds like just running the query on the console should be enogh. It doesn't says that the data will need to be uploaded again anytime soon, so we can asume that its just a one time query to be run.
upvoted 1 times
David_ml
3 years, 6 months ago
A is wrong. Answer is D. It's a pipeline which means you will run it multiple times? Do you always want to make the query manually each time you run your pipeline?
upvoted 3 times
...
...
xiaoF
3 years, 9 months ago
D is good.
upvoted 3 times
...
aepos
3 years, 11 months ago
The result of D is just the path to the Cloud Storage where the result is stored not the data itself. So the input to the next step is this path, where you still have to load the data? So i would guess B. Can anyone explain if i am wrong?
upvoted 2 times
...
kaike_reis
3 years, 12 months ago
D. The easiest way possible in developer's world: copy code from stackoverflow or github hahaha. Jokes a part, I think D is the correct. (A) is manual, so you have to do always. (B) could be, but is not the easiest one because you need to write a script for this. (C) uses Kubeflow intern solution, but you need to work to create a custom component. (D) is the (C) solution, but easier using a component created previously to do the job.
upvoted 3 times
...

Topic 1 Question 36

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 36 discussion

You are building a model to predict daily temperatures. You split the data randomly and then transformed the training and test datasets. Temperature data for model training is uploaded hourly. During testing, your model performed with 97% accuracy; however, after deploying to production, the model's accuracy dropped to 66%. How can you make your production model more accurate?

  • A. Normalize the data for the training, and test datasets as two separate steps.
  • B. Split the training and test data based on time rather than a random split to avoid leakage.
  • C. Add more data to your test set to ensure that you have a fair distribution and sample for testing.
  • D. Apply data transformations before splitting, and cross-validate to make sure that the transformations are applied to both the training and test sets.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
maartenalexander
Highly Voted 4 years, 4 months ago
B. If you do time series prediction, you can't borrow information from the future to predict the future. If you do, you are artificially increasing your accuracy.
upvoted 37 times
...
OpenKnowledge
Most Recent 2 months, 1 week ago
Selected Answer: B
The core principle is to use historical data (the training set) to build a model and then evaluate its performance on future, unseen data (the test set). So, the dataset must be split into training and testing sets based on time (temporal split/out-of-time split/chronological split). This approach simulates real-world scenarios where models must make predictions on data that occurred after they were trained, making it especially useful for time series forecasting.
upvoted 1 times
...
desertlotus1211
10 months, 2 weeks ago
Selected Answer: B
D is incorrect: Applying transformations before splitting is important, but it does not resolve the issue of time leakage. Even if transformations are done correctly, the random split will still lead to inflated test accuracy and poor production performance. This option focuses on correct data processing, but it does not address the leakage caused by random splitting in time series data.
upvoted 1 times
...
baimus
1 year, 2 months ago
Selected Answer: D
It's D
upvoted 1 times
baimus
1 year, 2 months ago
B I mean. Sorry I wrote that comment very early and there is no delete key!
upvoted 1 times
...
...
jsalvasoler
1 year, 3 months ago
Selected Answer: B
temporal split is a must in time series forecasting evaluation
upvoted 1 times
...
PhilipKoku
1 year, 5 months ago
Selected Answer: B
B) Time split to avoid leaking data.
upvoted 1 times
...
fragkris
1 year, 11 months ago
Selected Answer: B
Definetely B
upvoted 1 times
...
Sum_Sum
1 year, 12 months ago
Selected Answer: B
they did not explicitly say forecasting, but splitting by time is the number one rule you learn
upvoted 1 times
...
M25
2 years, 6 months ago
Selected Answer: B
Went with B
upvoted 1 times
...
SergioRubiano
2 years, 7 months ago
Selected Answer: D
D is correct. cross-validate
upvoted 2 times
...
Mohamed_Mossad
3 years, 5 months ago
Selected Answer: B
train accuracy 97% , production accuracy 66% ---> time series data ---> random split ---> cause leakage , answer is B
upvoted 2 times
...
David_ml
3 years, 6 months ago
Selected Answer: B
You don't split data randomly for time series prediction.
upvoted 3 times
...
mmona19
3 years, 7 months ago
Selected Answer: B
B should be the answer. D is incorrect as normalize before split is going to do data leak https://community.rapidminer.com/discussion/32592/normalising-data-before-data-split-or-after
upvoted 2 times
...
giaZ
3 years, 8 months ago
Selected Answer: B
If you do random split in a time series, your risk that training data will contain information about the target (definition of leakage), but similar data won't be available when the model is used for prediction. Leakage causes the model to look accurate until you start making actual predictions with it.
upvoted 3 times
...
xiaoF
3 years, 9 months ago
agree B as well
upvoted 2 times
...
JobQ
3 years, 10 months ago
I think is B
upvoted 2 times
...
Danny2021
4 years, 2 months ago
B. D doesn't improve anything at all. Split and Transform is no different than Transform and Split if the transform logic is the same.
upvoted 3 times
...

Topic 1 Question 37

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 37 discussion

You are developing models to classify customer support emails. You created models with TensorFlow Estimators using small datasets on your on-premises system, but you now need to train the models using large datasets to ensure high performance. You will port your models to Google Cloud and want to minimize code refactoring and infrastructure overhead for easier migration from on-prem to cloud. What should you do?

  • A. Use AI Platform for distributed training.
  • B. Create a cluster on Dataproc for training.
  • C. Create a Managed Instance Group with autoscaling.
  • D. Use Kubeflow Pipelines to train on a Google Kubernetes Engine cluster.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
maartenalexander
Highly Voted 3 years, 10 months ago
A. AI platform provides lower infrastructure overhead and allows you to not have to refactor your code too much (no containerization and such, like in KubeFlow).
upvoted 30 times
...
PhilipKoku
Most Recent 11 months, 1 week ago
Selected Answer: A
A) AI Platform
upvoted 1 times
...
girgu
11 months, 2 weeks ago
The most suitable option for minimizing code refactoring and infrastructure overhead while enabling large-scale training on Google Cloud is: A. Use AI Platform for distributed training. * **Simplified Workflow:** AI Platform offers a managed service for training machine learning models. You can train your existing TensorFlow Estimator code with minimal changes, reducing the need for extensive code refactoring. * **Distributed Training:** AI Platform automatically handles distributing your training job across multiple machines, allowing you to leverage the power of Google's cloud infrastructure to train on large datasets efficiently. * **Reduced Infrastructure Overhead:** You don't need to manage the underlying infrastructure (e.g., setting up and maintaining a cluster) yourself. AI Platform takes care of all the infrastructure provisioning and management, minimizing the workload on your team.
upvoted 2 times
...
fragkris
1 year, 5 months ago
Selected Answer: A
I chose A. Even though D is a working option, it requires us to create a GKE cluster, which requires more work.
upvoted 2 times
...
Sum_Sum
1 year, 5 months ago
Selected Answer: A
A - because it has native support for TF
upvoted 1 times
...
harithacML
1 year, 10 months ago
Selected Answer: A
A. Use AI Platform for distributed training. : Managed , low infra change migration: yes , although need code refactoring to bigquery sql B. Create a cluster on Dataproc for training. : only cluster ? what about training? C. Create a Managed Instance Group with autoscaling. : Same Q? D. Use Kubeflow Pipelines to train on a Google Kubernetes Engine cluster : only training?
upvoted 2 times
...
M25
2 years ago
Selected Answer: A
Went with A
upvoted 1 times
...
Fatiy
2 years, 2 months ago
Selected Answer: A
Option A is the best choice as AI Platform provides a distributed training framework, enabling you to train large-scale models faster and with less effort
upvoted 1 times
...
Mohamed_Mossad
2 years, 11 months ago
Selected Answer: A
using options eliminations answer between A,D will vote for A as it is easier
upvoted 1 times
...
Mohamed_Mossad
2 years, 11 months ago
- using options eliminations answer between A,D will vote for A as it is easier
upvoted 1 times
...
David_ml
3 years ago
Selected Answer: A
The answer is A. AI platform also contains kubeflow pipelines. you don't need to set up infrastructure to use it. For D you need to set up a kubernetes cluster engine. The question asks us to minimize infrastructure overheard.
upvoted 2 times
...
mmona19
3 years ago
Selected Answer: D
D- Kubeflow pipelines with Vertex ai provides you ability to reuse existing code using a TF conatiner in a pipeline. it helps automate the process. there is a qwiklab walking through this. A-incorrect, question is asking resuse existing code with minimum changes. distributed deployment does not address that.
upvoted 1 times
David_ml
3 years ago
The answer is A. AI platform also contains kubeflow pipelines. you don't need to set up infrastructure to use it. For D you need to set up a kubernetes cluster engine. The question asks us to minimize infrastructure overheard.
upvoted 2 times
...
...
A4M
3 years, 3 months ago
A - better to go with managed service and distributed
upvoted 2 times
...
DHEEPAK
3 years, 3 months ago
I am 100% sure that the answer is D. Kubeflow pipelines were designed keeping: A) Portability. B) Composability. C) Flexibility in mind. This is the pain point that the kubeflow pipelines address
upvoted 1 times
David_ml
3 years ago
The answer is A. AI platform also contains kubeflow pipelines. you don't need to set up infrastructure to use it. For D you need to set up a kubernetes cluster engine. The question asks us to minimize infrastructure overheard.
upvoted 2 times
...
...
NamitSehgal
3 years, 4 months ago
Selected Answer: A
TensorFlow Estimators means require distributed and that is key feature for AI platform or later Vertex AI.
upvoted 3 times
...
JobQ
3 years, 4 months ago
I think is A
upvoted 1 times
...
q4exam
3 years, 7 months ago
I think the answer is either A or B, but personally think it is likely B because dataproc is a common tool box on GCP used for ML while AI platform might require refactoring. However, I dont really know A or B
upvoted 3 times
george_ognyanov
3 years, 7 months ago
Another vote for answer A. AI Platform distributed training here. However, I wanted to share my logic why its not B as well. Dataproc is a managed Hadoop and as such needs a processing engine for ML tasks. Most likely Spark and SparkML. Now Spark code is quite different than pure Python and SparkML is even more different than TFcode. I imagine there might me a way to convert TF code to run on SparkML, but this seems a lot of work. And besides the question specifically wants us to minimize refactoring, so there you have it, we can eliminate option B 100%.
upvoted 5 times
...
...

Topic 1 Question 38

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 38 discussion

You have trained a text classification model in TensorFlow using AI Platform. You want to use the trained model for batch predictions on text data stored in
BigQuery while minimizing computational overhead. What should you do?

  • A. Export the model to BigQuery ML.
  • B. Deploy and version the model on AI Platform.
  • C. Use Dataflow with the SavedModel to read the data from BigQuery.
  • D. Submit a batch prediction job on AI Platform that points to the model location in Cloud Storage.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
maartenalexander
Highly Voted 4 years, 4 months ago
A. You would want to minimize computational overhead–BigQuery minimizes such overhead
upvoted 21 times
q4exam
4 years, 1 month ago
BQML doesnt support NLP model
upvoted 3 times
ms_lemon
4 years, 1 month ago
you can import a TF model in BQ ML
upvoted 11 times
gcp2021go
4 years ago
agree. https://cloud.google.com/bigquery-ml/docs/making-predictions-with-imported-tensorflow-models
upvoted 6 times
...
...
harithacML
2 years, 4 months ago
No need . This is a text classification problem. need to convert words to numbers and use a classifier.
upvoted 3 times
...
...
...
chohan
Highly Voted 4 years, 4 months ago
I think it's A https://cloud.google.com/bigquery-ml/docs/making-predictions-with-imported-tensorflow-models#importing_models
upvoted 11 times
...
OpenKnowledge
Most Recent 1 month, 1 week ago
Selected Answer: A
IT is possible to import and use TensorFlow models within BigQuery ML. This allows to leverage BigQuery ML's inference capabilities and co-location with your data for predictions using custom TensorFlow models.
upvoted 2 times
...
bc3f222
8 months ago
Selected Answer: A
BQML can rum imported TF models
upvoted 2 times
...
Sivaram06
10 months, 1 week ago
Selected Answer: D
Not option A because BigQuery ML can be useful for certain tasks, it might not be the most efficient for batch predictions with a custom TensorFlow model trained on AI Platform.
upvoted 1 times
...
desertlotus1211
10 months, 2 weeks ago
Selected Answer: D
A is incorrect: BigQuery ML is used to train and deploy models directly within BigQuery, but it does not support importing and deploying external TensorFlow models. You cannot export a TensorFlow model directly to BigQuery ML; AI Platform is the correct service for TensorFlow-based models.
upvoted 1 times
desertlotus1211
10 months, 2 weeks ago
However BQ ML requires storing to Cloud Storage first. the question doesn't state this (should we assume?), which make Answer D better as it state cloud storage.
upvoted 1 times
...
...
PhilipKoku
1 year, 5 months ago
Selected Answer: A
A) BigQuery ML
upvoted 1 times
...
girgu
1 year, 5 months ago
Selected Answer: D
Use the gcloud command to submit a batch prediction job, specifying the model location in Cloud Storage and the BigQuery table as the input source.
upvoted 1 times
Jason_Cloud_at
1 year, 2 months ago
in the option D, it just mentioned GCS , BQ is no where to be found
upvoted 2 times
...
...
Aastha_Vashist
1 year, 7 months ago
Selected Answer: A
Bquery to minimize computational overhead
upvoted 2 times
...
MrTracer
1 year, 10 months ago
Selected Answer: D
Would go with D
upvoted 1 times
...
Sum_Sum
1 year, 12 months ago
Selected Answer: A
A - you can import TF models to BQ
upvoted 2 times
...
harithacML
2 years, 4 months ago
Selected Answer: A
Model : AI Platform. pred batch data : BigQuery constraint : computational overhead Same platform as data == less computation required to load and pass it to model
upvoted 2 times
...
Liting
2 years, 4 months ago
Selected Answer: A
minimize computational overhead–>BigQuery
upvoted 2 times
...
Voyager2
2 years, 5 months ago
Not sure if when you have the saved model in Cloud storage that means that you don't use compute in vertex. I think that the option compute-free is bigquery
upvoted 1 times
...
Voyager2
2 years, 5 months ago
Not sure Text Classification Using BigQuery ML and ML.NGRAMS https://medium.com/@jeffrey.james/text-classification-using-bigquery-ml-and-ml-ngrams-6e365f0b5505
upvoted 1 times
...
rexduo
2 years, 5 months ago
Selected Answer: A
I think D have extra compute on extrating data frm BQ
upvoted 2 times
...
Darshan12
2 years, 5 months ago
There are some drawbacks to option D. Cost: Submitting a batch prediction job on AI Platform is a paid service. The cost will depend on the size of the model and the amount of data that you are predicting. Complexity: Submitting a batch prediction job on AI Platform requires you to write some code. This can be a challenge if you are not familiar with AI Platform. Performance: Submitting a batch prediction job on AI Platform may not be as efficient as using BigQuery ML. This is because AI Platform needs to load the model into memory before it can run the predictions. Overall, option D is a viable option, but it may not be the best option for all situations.
upvoted 2 times
...

Topic 1 Question 39

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 39 discussion

You work with a data engineering team that has developed a pipeline to clean your dataset and save it in a Cloud Storage bucket. You have created an ML model and want to use the data to refresh your model as soon as new data is available. As part of your CI/CD workflow, you want to automatically run a Kubeflow
Pipelines training job on Google Kubernetes Engine (GKE). How should you architect this workflow?

  • A. Configure your pipeline with Dataflow, which saves the files in Cloud Storage. After the file is saved, start the training job on a GKE cluster.
  • B. Use App Engine to create a lightweight python client that continuously polls Cloud Storage for new files. As soon as a file arrives, initiate the training job.
  • C. Configure a Cloud Storage trigger to send a message to a Pub/Sub topic when a new file is available in a storage bucket. Use a Pub/Sub-triggered Cloud Function to start the training job on a GKE cluster.
  • D. Use Cloud Scheduler to schedule jobs at a regular interval. For the first step of the job, check the timestamp of objects in your Cloud Storage bucket. If there are no new files since the last run, abort the job.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Paul_Dirac
Highly Voted 3 years, 10 months ago
C https://cloud.google.com/architecture/architecture-for-mlops-using-tfx-kubeflow-pipelines-and-cloud-build#triggering-and-scheduling-kubeflow-pipelines
upvoted 17 times
...
Paul_Dirac
Highly Voted 3 years, 10 months ago
C https://cloud.google.com/architecture/architecture-for-mlops-using-tfx-kubeflow-pipelines-and-cloud-build#triggering-and-scheduling-kubeflow-pipelines
upvoted 7 times
ori5225
3 years, 9 months ago
On a schedule, using Cloud Scheduler. Responding to an event, using Pub/Sub and Cloud Functions. For example, the event can be the availability of new data files in a Cloud Storage bucket.
upvoted 1 times
tavva_prudhvi
1 year, 10 months ago
Option D requires the job to be scheduled at regular intervals, even if there are no new files. This can waste resources and lead to unnecessary delays in the training process.
upvoted 1 times
...
...
...
PhilipKoku
Most Recent 11 months, 1 week ago
Selected Answer: C
C) PUB/sub trigger from Cloud Storage & Cloud Function
upvoted 1 times
...
fragkris
1 year, 5 months ago
Selected Answer: C
C - This is the google reccomended method.
upvoted 1 times
...
Sum_Sum
1 year, 5 months ago
Selected Answer: C
C- because you don't want to re-engineer the pipeline
upvoted 1 times
...
M25
2 years ago
Selected Answer: C
Went with C
upvoted 1 times
...
Fatiy
2 years, 2 months ago
Selected Answer: C
The scenario involves automatically running a Kubeflow Pipelines training job on GKE as soon as new data becomes available. To achieve this, we can use Cloud Storage to store the cleaned dataset, and then configure a Cloud Storage trigger that sends a message to a Pub/Sub topic whenever a new file is added to the storage bucket. We can then create a Pub/Sub-triggered Cloud Function that starts the training job on a GKE cluster.
upvoted 1 times
...
behzadsw
2 years, 4 months ago
Selected Answer: A
The question says: As part of your CI/CD workflow, you want to automatically run a Kubeflow.. C is also an option but it seems more cumbersome. One thing hat could be against A is that the data engineering team is separate team so they might not access your CI/CD if any changes from their side is needed..
upvoted 1 times
tavva_prudhvi
1 year, 10 months ago
Option A requires the data engineering team to modify the pipeline, which can be time-consuming and error-prone.
upvoted 1 times
...
...
hiromi
2 years, 5 months ago
Selected Answer: C
C Pubsub is the keyword
upvoted 2 times
...
Mohamed_Mossad
2 years, 10 months ago
Selected Answer: C
event driven architecture is better than polling based architecure so I will vote for C
upvoted 1 times
...

Topic 1 Question 40

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 40 discussion

You have a functioning end-to-end ML pipeline that involves tuning the hyperparameters of your ML model using AI Platform, and then using the best-tuned parameters for training. Hypertuning is taking longer than expected and is delaying the downstream processes. You want to speed up the tuning job without significantly compromising its effectiveness. Which actions should you take? (Choose two.)

  • A. Decrease the number of parallel trials.
  • B. Decrease the range of floating-point values.
  • C. Set the early stopping parameter to TRUE.
  • D. Change the search algorithm from Bayesian search to random search.
  • E. Decrease the maximum number of trials during subsequent training phases.
Show Suggested Answer Hide Answer
Suggested Answer: CE 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
gcp2021go
Highly Voted 4 years, 3 months ago
I think should CE. I can't find any reference regarding B can reduce tuning time.
upvoted 21 times
...
Paul_Dirac
Highly Voted 4 years, 4 months ago
Answer: B & C (Ref: https://cloud.google.com/ai-platform/training/docs/using-hyperparameter-tuning) (A) Decreasing the number of parallel trials will increase tuning time. (D) Bayesian search works better and faster than random search since it's selective in points to evaluate and uses knowledge of previouls evaluated points. (E) maxTrials should be larger than 10*the number of hyperparameters used. And spanning the whole minimum space (10*num_hyperparams) already takes some time. So, lowering maxTrials has little effect on reducing tuning time.
upvoted 16 times
dxxdd7
4 years, 2 months ago
In your link, when they mentionned maxTrials they said that "In most cases there is a point of diminishing returns after which additional trials have little or no effect on the accuracy" They also say that it can affect time and cost I think i'd rather go with CE
upvoted 11 times
...
Goosemoose
1 year, 5 months ago
Bayesian search should cost more time, because it can converge in fewer iterations than the other algorithms but not necessarily in a faster time because trials are dependent and thus require sequentiality
upvoted 1 times
...
...
bc3f222
Most Recent 8 months ago
Selected Answer: CE
apart from early stopping which no one has doubt about E (reduce max trails has the lowest propensity to reduce performance)
upvoted 2 times
...
vinevixx
10 months ago
Selected Answer: BC
Decreasing the range of floating-point values reduces the search space for the hyperparameter tuning process. A smaller search space allows the algorithm to converge faster to an optimal solution by focusing only on a narrower range of values. This approach speeds up tuning without significantly compromising effectiveness, as the range is constrained to more reasonable values. Why C is correct: Setting the early stopping parameter to TRUE enables the tuning process to stop trials early if it becomes clear that a given trial is not improving or yielding promising results. This prevents unnecessary computation and saves time by discarding underperforming configurations early in the process.
upvoted 1 times
...
Ankit267
10 months, 2 weeks ago
Selected Answer: CE
C & E are the choices
upvoted 1 times
...
TornikePirveli
1 year, 2 months ago
In the PMLE book it's grid search instead of Bayesian search and that makes sense, but there is also marked Decrease the number of parallel trials as correct answer, which I think should be wrong.
upvoted 1 times
...
nktyagi
1 year, 3 months ago
Selected Answer: AB
With Vertex AI hyperparameter tuning, you can configure the number of trials and the search algorithm as well as range of parameters.
upvoted 1 times
...
PhilipKoku
1 year, 5 months ago
Selected Answer: CD
C) and D)
upvoted 3 times
...
pinimichele01
1 year, 7 months ago
Selected Answer: CE
see pawan94
upvoted 3 times
...
pawan94
1 year, 10 months ago
C and E, if you reference the latest docs of hptune job on vertex ai : 1. A not possible (refer: https://cloud.google.com/vertex-ai/docs/training/using-hyperparameter-tuning#:~:text=the%20benefit%20of%20reducing%20the%20time%20the) , if you reduce the number of parallel trials then the speed of overall completion gets negatively affected. . The question is about how to speed up the process but not changing the model params. Changing the optimization algorithm would lead to unexpected results. So in my opinion C and E ( after carefully reading the updated docs) and please don't believe everything CHATGPT says . I encountered so many questions where the LLM's are giving completely wrong answers
upvoted 4 times
...
fragkris
1 year, 11 months ago
Selected Answer: CD
I chose C and D
upvoted 3 times
...
Sum_Sum
1 year, 12 months ago
Selected Answer: CD
Chat GPT says: . Set the early stopping parameter to TRUE. Early Stopping: Enabling early stopping allows the tuning process to terminate a trial if it becomes clear that it's not producing promising results. This prevents wasting time on unpromising trials and can significantly speed up the hyperparameter tuning process. It helps to focus resources on more promising parameter combinations. D. Change the search algorithm from Bayesian search to random search. Random Search Algorithm: Random search, as opposed to Bayesian optimization, doesn't attempt to build a model of the objective function. While Bayesian search can be more efficient in finding the optimal parameters, random search is often faster per iteration. Random search can be particularly effective when the hyperparameter space is large, as it doesn't require as much computational power to select the next set of parameters to evaluate.
upvoted 4 times
...
Voyager2
2 years, 5 months ago
Selected Answer: CE
C&E This video explains very well the max trials and parallel trials https://youtu.be/8hZ_cBwNOss This link explains early stopping See https://cloud.google.com/ai-platform/training/docs/using-hyperparameter-tuning#early-stopping
upvoted 4 times
...
rexduo
2 years, 5 months ago
Selected Answer: CE
A increase time, B HP tuning job normally bottle neck is not at model size, D did reduce time, but might significantly hurt effectiveness
upvoted 2 times
...
CloudKida
2 years, 6 months ago
Selected Answer: AC
Running parallel trials has the benefit of reducing the time the training job takes (real time—the total processing time required is not typically changed). However, running in parallel can reduce the effectiveness of the tuning job overall. That is because hyperparameter tuning uses the results of previous trials to inform the values to assign to the hyperparameters of subsequent trials. When running in parallel, some trials start without having the benefit of the results of any trials still running. You can specify that AI Platform Training must automatically stop a trial that has become clearly unpromising. This saves you the cost of continuing a trial that is unlikely to be useful. To permit stopping a trial early, set the enableTrialEarlyStopping value in the HyperparameterSpec to TRUE.
upvoted 1 times
...
M25
2 years, 6 months ago
Selected Answer: CE
Went with C & E
upvoted 2 times
...
kucuk_kagan
2 years, 7 months ago
Selected Answer: AD
To speed up the tuning job without significantly compromising its effectiveness, you can take the following actions: A. Decrease the number of parallel trials: By reducing the number of parallel trials, you can limit the amount of computational resources being used at a given time, which may help speed up the tuning job. However, reducing the number of parallel trials too much could limit the exploration of the parameter space and result in suboptimal results. D. Change the search algorithm from Bayesian search to random search: Bayesian optimization is a computationally intensive method that requires more time and resources than random search. By switching to a simpler method like random search, you may be able to speed up the tuning job without compromising its effectiveness. However, random search may not be as efficient in finding the best hyperparameters as Bayesian optimization.
upvoted 1 times
...

Topic 1 Question 41

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 41 discussion

Your team is building an application for a global bank that will be used by millions of customers. You built a forecasting model that predicts customers' account balances 3 days in the future. Your team will use the results in a new feature that will notify users when their account balance is likely to drop below $25. How should you serve your predictions?

  • A. 1. Create a Pub/Sub topic for each user. 2. Deploy a Cloud Function that sends a notification when your model predicts that a user's account balance will drop below the $25 threshold.
  • B. 1. Create a Pub/Sub topic for each user. 2. Deploy an application on the App Engine standard environment that sends a notification when your model predicts that a user's account balance will drop below the $25 threshold.
  • C. 1. Build a notification system on Firebase. 2. Register each user with a user ID on the Firebase Cloud Messaging server, which sends a notification when the average of all account balance predictions drops below the $25 threshold.
  • D. 1. Build a notification system on Firebase. 2. Register each user with a user ID on the Firebase Cloud Messaging server, which sends a notification when your model predicts that a user's account balance will drop below the $25 threshold.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
salsabilsf
Highly Voted 3 years, 11 months ago
Should be D ! creating a Pub/Sub topic for each user is overkill
upvoted 20 times
Y2Data
3 years, 7 months ago
Yes, create a topic is overkill but not a NOTIFICATION SYSTEM. it's totally normal. Seriously, the step two involves "REGISTER EACH USER ....", how is this better than create a topic???? should be A and it's so obvious!
upvoted 3 times
q4exam
3 years, 7 months ago
I think A is straight forward answer but in real life, customer also consider cost, so practically, app engine will be picked in this case..... because of the large user base
upvoted 3 times
...
...
...
SlipperySlope
Highly Voted 3 years, 2 months ago
Selected Answer: D
D is correct. Firebase is designed for exactly this sort of scenario. Also, it would not be possible to create millions of pubsub topics due to GCP quotas https://cloud.google.com/pubsub/quotas#quotas https://firebase.google.com/docs/cloud-messaging
upvoted 9 times
...
OpenKnowledge
Most Recent 2 months ago
Selected Answer: D
This is similar to how the push notification works to send notifications to ios and android devices. Firebase cloud messaging (FCM), formerly known as Google cloud messaging (GCM), can be set to configure push notifications to ios, Android and web applications from server (in this case the banking application server) when the user registers his device to receive the notifications.
upvoted 1 times
...
PhilipKoku
11 months, 1 week ago
Selected Answer: D
D) Firebase
upvoted 1 times
...
fragkris
1 year, 5 months ago
Selected Answer: D
D is correct. Firebase is used for applications.
upvoted 1 times
...
harithacML
1 year, 10 months ago
Selected Answer: A
simple answer , use tools most mentioned during training . , cloud functions
upvoted 1 times
Kowalski
1 year, 8 months ago
Pub/Sub has a limit of 10,000 topics only and can't be increased https://cloud.google.com/pubsub/quotas#resource_limits.
upvoted 3 times
...
...
M25
2 years ago
Selected Answer: D
Went with D
upvoted 1 times
...
SergioRubiano
2 years, 1 month ago
Selected Answer: D
"Create a Pub/Sub topic for each user" xD
upvoted 2 times
...
Mohamed_Mossad
2 years, 10 months ago
Selected Answer: D
"Create a Pub/Sub topic for each user" this is crazy , we can not imagine a system with millions of pub/sub topics , so A,B wrong C also wrong
upvoted 3 times
...
mmona19
3 years ago
Selected Answer: D
D- is more automated compared to A. A is overkill
upvoted 1 times
...
Vidyasagar
3 years, 3 months ago
Selected Answer: D
I think, D is the best answer
upvoted 3 times
...
fdmenendez
3 years, 3 months ago
Project limit is 10,000 topics, you could have multiple projects but that does not scale well. so D. https://cloud.google.com/pubsub/quotas#resource_limits
upvoted 4 times
...
NamitSehgal
3 years, 4 months ago
D looks more relevant Notification messages: Simply display a message content, which is handled by the FCM SDK. Data Messages: Display a message with some set interactions
upvoted 3 times
...
Danny2021
3 years, 6 months ago
A doesn't work. There is a quota limit on the number of pub/sub topics you can create, also one Cloud function cannot subscribe to millions of topics. A doesn't scale at all.
upvoted 3 times
...
Danny2021
3 years, 6 months ago
Answer is D. FCM is designed for this type of notification sent to mobile and desktop apps.
upvoted 4 times
...

Topic 1 Question 42

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 42 discussion

You work for an advertising company and want to understand the effectiveness of your company's latest advertising campaign. You have streamed 500 MB of campaign data into BigQuery. You want to query the table, and then manipulate the results of that query with a pandas dataframe in an AI Platform notebook.
What should you do?

  • A. Use AI Platform Notebooks' BigQuery cell magic to query the data, and ingest the results as a pandas dataframe.
  • B. Export your table as a CSV file from BigQuery to Google Drive, and use the Google Drive API to ingest the file into your notebook instance.
  • C. Download your table from BigQuery as a local CSV file, and upload it to your AI Platform notebook instance. Use pandas.read_csv to ingest he file as a pandas dataframe.
  • D. From a bash cell in your AI Platform notebook, use the bq extract command to export the table as a CSV file to Cloud Storage, and then use gsutil cp to copy the data into the notebook. Use pandas.read_csv to ingest the file as a pandas dataframe.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
zosoabi
Highly Voted 3 years, 11 months ago
A: no "CSV" found in provided link https://cloud.google.com/bigquery/docs/bigquery-storage-python-pandas
upvoted 27 times
...
Sum_Sum
Highly Voted 1 year, 5 months ago
Selected Answer: A
A is the google recommended answer. And what you should use C is what the intern does ...
upvoted 6 times
sharth
1 year, 4 months ago
Dude, I laughed so hard
upvoted 3 times
...
...
b7ad1d9
Most Recent 1 month, 3 weeks ago
Selected Answer: A
It doesn't make any sense to move the data to CSV like the other options do. BQ magic commands or BQ API are the way to go
upvoted 1 times
...
OpenKnowledge
2 months ago
Selected Answer: A
BigQuery cell magic refers to the use of specific commands within Jupyter notebooks (including environments like Colab and Vertex AI Workbench) to interact with Google BigQuery directly from a code cell. The magic commands allows user to execute SQL queries, return results as panda datagram and save results to a variable.
upvoted 1 times
...
PhilipKoku
11 months, 1 week ago
Selected Answer: A
A) Magic command
upvoted 1 times
...
M25
2 years ago
Selected Answer: A
Went with A
upvoted 2 times
...
SergioRubiano
2 years, 1 month ago
Selected Answer: A
A, Using the command %%bigquery df
upvoted 1 times
...
Dunnoth
2 years, 2 months ago
Why not D? using BQ notebook magic would be ok for a single time use. but usually a DS would reload the data multiple time, and every time you need to stream 500mb data to the notebook instance from BQ. Isn't it cheaper to store the data as a csv in a bucket?
upvoted 2 times
...
John_Pongthorn
2 years, 3 months ago
Selected Answer: A
%%bigquery df SELECT name, SUM(number) as count FROM `bigquery-public-data.usa_names.usa_1910_current` GROUP BY name ORDER BY count DESC LIMIT 3 print(df.head())
upvoted 4 times
...
hiromi
2 years, 5 months ago
Selected Answer: A
A https://cloud.google.com/bigquery/docs/visualize-jupyter
upvoted 2 times
...
Sachin2360
2 years, 10 months ago
Answer : A . Refer to this link for details: https://cloud.google.com/bigquery/docs/bigquery-storage-python-pandas First 2 points talks about querying the data. Download query results to a pandas DataFrame by using the BigQuery Storage API from the IPython magics for BigQuery in a Jupyter notebook. Download query results to a pandas DataFrame by using the BigQuery client library for Python. Download BigQuery table data to a pandas DataFrame by using the BigQuery client library for Python. Download BigQuery table data to a pandas DataFrame by using the BigQuery Storage API client library for Python.
upvoted 2 times
...
Mohamed_Mossad
2 years, 11 months ago
Selected Answer: A
https://googleapis.dev/python/bigquery/latest/magics.html#ipython-magics-for-bigquery
upvoted 2 times
...
NickNtaken
3 years ago
Selected Answer: A
this is the simplest and most straightforward way read BQ data into Pandas dataframe.
upvoted 3 times
...
mmona19
3 years ago
Selected Answer: C
both A and C is technically correct. C has more manual step and A has less. The question does not ask which requires least effort. so C is clear answer
upvoted 1 times
wish0035
2 years, 4 months ago
"A and C are valid, but C is more difficult than A. they don't ask to be easier so I will go with the more difficult". WHAAAT? Google best practices are always: easier > harder. Even they encourage you to skip ML if you don't need ML.
upvoted 2 times
...
...
SlipperySlope
3 years, 2 months ago
Selected Answer: C
C is the correct answer due to the size of the data. It wouldn't be possible to download it all into an in memory data frame.
upvoted 1 times
u_phoria
2 years, 10 months ago
500mb of data into a pandas dataframe generally isn't a problem, far from it.
upvoted 2 times
...
...
ggorzki
3 years, 3 months ago
Selected Answer: A
IPython magics for BigQuery https://cloud.google.com/bigquery/docs/bigquery-storage-python-pandas
upvoted 1 times
...
NamitSehgal
3 years, 4 months ago
I agree with A
upvoted 1 times
...

Topic 1 Question 43

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 43 discussion

You are an ML engineer at a global car manufacture. You need to build an ML model to predict car sales in different cities around the world. Which features or feature crosses should you use to train city-specific relationships between car type and number of sales?

  • A. Thee individual features: binned latitude, binned longitude, and one-hot encoded car type.
  • B. One feature obtained as an element-wise product between latitude, longitude, and car type.
  • C. One feature obtained as an element-wise product between binned latitude, binned longitude, and one-hot encoded car type.
  • D. Two feature crosses as an element-wise product: the first between binned latitude and one-hot encoded car type, and the second between binned longitude and one-hot encoded car type.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Paul_Dirac
Highly Voted 4 years, 3 months ago
C https://developers.google.com/machine-learning/crash-course/feature-crosses/check-your-understanding
upvoted 22 times
...
ebinv2
Highly Voted 4 years, 4 months ago
C should be the answer
upvoted 8 times
...
mayankblitzster
Most Recent 1 month, 3 weeks ago
Selected Answer: C
Captures City-Specific Relationships: Crossing binned latitude and longitude creates a unique identifier for a specific geographic area or "city block". Crossing this with the one-hot encoded car type allows the model to learn sales patterns that are specific to each car type within those defined city areas. Binning Handles Continuous Data: Binning (bucketizing) latitude and longitude converts these continuous values into discrete, categorical features, which are then suitable for feature crosses. Binning also enables the model to learn non-linear relationships within a single feature. Handles Non-Linearities and Interactions: Feature crosses, like this element-wise product, allow linear models to capture non-linear relationships and interactions between features, providing more predictive power than individual features alone
upvoted 2 times
...
mayankblitzster
1 month, 3 weeks ago
Selected Answer: D
Two feature crosses as an element-wise product: the first between binned latitude and one-hot encoded car type, and the second between binned longitude and one-hot encoded car type. 🔍 Why? This approach allows the model to learn city-specific relationships between car types and sales by: Binning latitude and longitude: Groups nearby locations into discrete regions (e.g., cities or metro areas). One-hot encoding car types: Makes each car type a distinct feature. Crossing location bins with car types: Captures how preferences for car types vary by geographic region. By creating two separate feature crosses (latitude × car type and longitude × car type), the model can learn nuanced patterns without overcomplicating the feature space.
upvoted 1 times
...
b7ad1d9
1 month, 3 weeks ago
Selected Answer: C
C is the closest here. The REAL answer would be feature cross only Binned Latitude and Binned Longitude to get a feature that represents a city/location/sales area. There is no real need to also add the model in the feature cross. But given the choices C is the best fit.
upvoted 1 times
...
desertlotus1211
1 year ago
Why not Answer D?
upvoted 1 times
desertlotus1211
1 year ago
Would C be moe complex than D?
upvoted 1 times
...
...
baimus
1 year, 2 months ago
While i acknowledge the answer is C, It seems wrong to elementwise combine binned lat/lon, as it means there are at least 2 places with the same number in the world, probably more. Not only but by multiplying the binned a values it implies they are ordinal, but they are not ordinal in the same direction, so the relationship on price will be lost (a good example is northern countries tend to be richer, but the east/west relationship isn't defined)
upvoted 2 times
desertlotus1211
1 year ago
Why not Answer D?
upvoted 1 times
...
desertlotus1211
1 year ago
Would C be moe complex than D?
upvoted 1 times
...
...
PhilipKoku
1 year, 5 months ago
Selected Answer: C
C) one feature
upvoted 1 times
...
Sum_Sum
1 year, 12 months ago
C - everything else is madness
upvoted 2 times
rigori
1 year, 4 months ago
creating this cross feature is madness from explainability standpoint
upvoted 1 times
...
...
M25
2 years, 6 months ago
Selected Answer: C
Went with C
upvoted 1 times
...
Mohamed_Mossad
3 years, 5 months ago
Selected Answer: C
https://developers.google.com/machine-learning/crash-course/feature-crosses/video-lecture
upvoted 6 times
...
A4M
3 years, 9 months ago
C - Answer when doing feature cross the features need to be binned
upvoted 4 times
...
MK_Ahsan
3 years, 10 months ago
Selected Answer: C
https://developers.google.com/machine-learning/crash-course/feature-crosses/check-your-understanding Answer C: It needs a feature cross to obtain one feature.
upvoted 3 times
...
NamitSehgal
3 years, 10 months ago
I got with C
upvoted 3 times
...
ramen_lover
4 years ago
"element-wise product" sounds like we are not using a feature cross but artificially creating a new column whose values is the "element-wise product" of other column values...; i.e., (1, 2, 3) => 1 * 2 * 3 = 6. I am not a native English speaker; thus, I might misunderstand the sentence.
upvoted 1 times
...
ralf_cc
4 years, 4 months ago
D - https://developers.google.com/machine-learning/crash-course/feature-crosses/video-lecture
upvoted 4 times
jk73
4 years, 1 month ago
Cannot be D, Despite Binning is a good idea because it enables the model to learn nonlinear relationships within a single feature; separate latitude and longitude in different feature crosses is not a good one, this separation will prevent the model from learning city-specific sales. A city is the conjunction of latitude and longitude. In that order of Ideas Crossing binned latitude with binned longitude enables the model to learn city-specific effects of car type. I will go for C, https://developers.google.com/machine-learning/crash-course/feature-crosses/check-your-understanding
upvoted 13 times
george_ognyanov
4 years, 1 month ago
Damn that was a good explanation. Thank you for writing it out.
upvoted 2 times
...
...
...

Topic 1 Question 44

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 44 discussion

You work for a large technology company that wants to modernize their contact center. You have been asked to develop a solution to classify incoming calls by product so that requests can be more quickly routed to the correct support team. You have already transcribed the calls using the Speech-to-Text API. You want to minimize data preprocessing and development time. How should you build the model?

  • A. Use the AI Platform Training built-in algorithms to create a custom model.
  • B. Use AutoMlL Natural Language to extract custom entities for classification.
  • C. Use the Cloud Natural Language API to extract custom entities for classification.
  • D. Build a custom model to identify the product keywords from the transcribed calls, and then run the keywords through a classification algorithm.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
chohan
Highly Voted 4 years, 4 months ago
Should be B -> minimize data preprocessing and development time
upvoted 25 times
neohanju
4 years, 2 months ago
I thought the answer is B too. However, after carefully reading the question and answers again, B produces entities for classification only, not a classification result. So, A and D are only candidates and A is better.
upvoted 2 times
...
sensev
4 years, 3 months ago
Agree its B. A and D is incorrect since it requires more development time. C is also incorrect since the product is company specific and might not be well recognized by Cloud Natural Language API.
upvoted 7 times
...
...
baimus
Highly Voted 3 years, 7 months ago
I'm leaning towards C over B here. The question is underlining that minimal development time is required, and C is even less than B. If the information is really domain specific, then you'd need B, but it's not clear what products the company sells, so we don't have enough info to say it's too domain specific for C.
upvoted 6 times
giaZ
3 years, 7 months ago
If anything, C is wrong because it tells you something that is not true: extract custom entities with Natural Language API it's not possible. That is something you can do only with AutoML. Look at this comparison table: https://cloud.google.com/natural-language#section-6 That's how they subtly point you at answer B.
upvoted 10 times
...
...
billyst41
Most Recent 1 month, 3 weeks ago
Selected Answer: B
From Gemini: Can you extract custom entities from the cloud natural language API? No, you cannot train the Cloud Natural Language API to extract custom entities. The Natural Language API is a pre-trained model that can only extract a fixed set of entity types, such as people, organizations, and locations
upvoted 2 times
...
mayankblitzster
1 month, 3 weeks ago
Selected Answer: B
A. Use AI Platform Training built-in algorithms Requires more manual setup, preprocessing, and model tuning—not ideal for minimizing development time. C. Use Cloud Natural Language API to extract custom entities The Cloud Natural Language API is for pre-trained entity types (e.g., locations, organizations), not custom classification. D. Build a custom model to identify product keywords and classify This is a manual and time-consuming approach, contrary to the goal of minimizing development time.
upvoted 1 times
...
b7ad1d9
1 month, 3 weeks ago
Selected Answer: B
These are custom product names that need to be extracted from the text. AutoML is the right option. Cloud NL API is for understanding general purpose of text like sentiment etc. It doesn't qute fit here
upvoted 1 times
...
gvk1
6 months, 4 weeks ago
Selected Answer: C
This is direct question from this doc: https://medium.com/@pysquad/ai-infused-empowerment-harnessing-business-potential-with-google-clouds-nlp-api-in-python-b4fcb9fea1a3. "Automated Ticket Classification: Categorize and prioritize support tickets for efficient handling."
upvoted 1 times
...
theseawillclaim
10 months, 3 weeks ago
Selected Answer: C
I think it's C. Natural Language API supports entity extraction, is pre-trained and comes off as a SaaS that you just pay per use. Considering you have already used STT on your data, I see no reason to use AutoML in this case.
upvoted 2 times
...
misya
1 year, 2 months ago
Selected Answer: C
Cloud NLP API no require custom training
upvoted 2 times
...
PhilipKoku
1 year, 5 months ago
Selected Answer: C
C) Cloud NLP API
upvoted 2 times
...
21c17b3
1 year, 8 months ago
I'm voting C here!
upvoted 3 times
...
ralf_cc
1 year, 9 months ago
AutoML only has classification and regression
upvoted 2 times
...
pico
2 years, 2 months ago
Selected Answer: C
Key Differences: Approach: Option B (AutoML Natural Language) involves using an AutoML service to train a custom NLP model, while Option C (Cloud Natural Language API) relies on a pre-built NLP API. Control and Customization: Option B gives you more control and customization over the training process, as you train a model specific to your needs. Option C offers less control but is quicker to set up since it uses a pre-built API. Complexity: Option B might require more technical expertise to set up and configure the AutoML model, while Option C is more straightforward and user-friendly. In summary, both options allow you to extract custom entities for classification, but Option B (AutoML) involves more manual involvement in training a custom model, while Option C (Cloud Natural Language API) provides a simpler, pre-built solution
upvoted 3 times
...
M25
2 years, 6 months ago
Selected Answer: B
Went with B
upvoted 2 times
...
lucaluca1982
2 years, 6 months ago
Selected Answer: C
why not C?
upvoted 1 times
julliet
2 years, 5 months ago
you have to classify company products, which are custom classes
upvoted 1 times
pico
2 years, 2 months ago
you can still use Option C (Cloud Natural Language API) even when the solution needs to classify incoming calls by company-specific products rather than general products. The Cloud Natural Language API can be customized to handle company-specific entities and classifications effectively.
upvoted 2 times
...
YushiSato
1 year, 2 months ago
It seems to me that if there is a product name that needs to be learned in AutoML Natural Language, there is a possibility that it cannot be transcribed into text by the Speech-to-Text API in the first place.
upvoted 1 times
...
...
...
John_Pongthorn
2 years, 8 months ago
Selected Answer: B
AutoML is appropriate to classify incoming calls by product (Custom) to be routed to the correct support team. Cloud Natural Language API is for general case (not particular business)
upvoted 1 times
...
Mohamed_Mossad
3 years, 5 months ago
Selected Answer: B
"minimize data preprocessing and development time" answer will be limited to B,C will choose C as Natural Language API does not handle custom operation
upvoted 2 times
...
mmona19
3 years, 7 months ago
B- automl custom classification and entity is going to help with minimum effort.
upvoted 4 times
...

Topic 1 Question 45

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 45 discussion

You are training a TensorFlow model on a structured dataset with 100 billion records stored in several CSV files. You need to improve the input/output execution performance. What should you do?

  • A. Load the data into BigQuery, and read the data from BigQuery.
  • B. Load the data into Cloud Bigtable, and read the data from Bigtable.
  • C. Convert the CSV files into shards of TFRecords, and store the data in Cloud Storage.
  • D. Convert the CSV files into shards of TFRecords, and store the data in the Hadoop Distributed File System (HDFS).
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
ralf_cc
Highly Voted 3 years, 10 months ago
C - not enough info in the question, but C is the "most correct" one
upvoted 26 times
...
mayankblitzster
Most Recent 1 month, 3 weeks ago
Selected Answer: C
Convert CSV files to TFRecord format and use tf.data API TFRecord is TensorFlow’s optimized binary format for storing large datasets. It is much faster to read and parse than CSV, especially at scale. Reduces CPU overhead and improves throughput during training. dataset = tf.data.TFRecordDataset(filenames) dataset = dataset.map(parse_function, num_parallel_calls=tf.data.AUTOTUNE) dataset = dataset.prefetch(buffer_size=tf.data.AUTOTUNE)
upvoted 3 times
...
theseawillclaim
10 months, 3 weeks ago
Selected Answer: C
It's C. BigTable would usually help with heavy I/O ops, but is not suited for (semi)structured data by design.
upvoted 1 times
...
PhilipKoku
11 months, 1 week ago
Selected Answer: C
C) The most suitable option for improving input/output execution performance in this scenario is C. Convert the CSV files into shards of TFRecords and store the data in Cloud Storage. This approach leverages the efficiency of TFRecords and the scalability of Cloud Storage, aligning with TensorFlow best practices.
upvoted 4 times
...
fragkris
1 year, 5 months ago
Selected Answer: C
C is the google reccomended approach.
upvoted 1 times
...
Sum_Sum
1 year, 5 months ago
C is the correct one as BQ will not help you with performance
upvoted 1 times
...
peetTech
1 year, 7 months ago
Selected Answer: C
C https://datascience.stackexchange.com/questions/16318/what-is-the-benefit-of-splitting-tfrecord-file-into-shards#:~:text=Splitting%20TFRecord%20files%20into%20shards,them%20through%20a%20training%20process.
upvoted 2 times
...
peetTech
1 year, 7 months ago
C https://datascience.stackexchange.com/questions/16318/what-is-the-benefit-of-splitting-tfrecord-file-into-shards#:~:text=Splitting%20TFRecord%20files%20into%20shards,them%20through%20a%20training%20process.
upvoted 1 times
...
ftl
1 year, 7 months ago
bard: The correct answer is: C. Convert the CSV files into shards of TFRecords, and store the data in Cloud Storage. TFRecords is a TensorFlow-specific binary format that is optimized for performance. Converting the CSV files into TFRecords will improve the input/output execution performance. Sharding the TFRecords will allow the data to be read in parallel, which will further improve performance. The other options are not as likely to improve performance. Loading the data into BigQuery or Cloud Bigtable will add an additional layer of abstraction, which can slow down performance. Storing the TFRecords in HDFS is not likely to improve performance, as HDFS is not optimized for TensorFlow.
upvoted 2 times
...
tavva_prudhvi
1 year, 9 months ago
Using BigQuery or Bigtable may not be the most efficient option for input/output operations with TensorFlow. Storing the data in HDFS may be an option, but Cloud Storage is generally a more scalable and cost-effective solution.
upvoted 1 times
...
PST21
1 year, 11 months ago
While Bigtable can offer high-performance I/O capabilities, it is important to note that it is primarily designed for structured data storage and real-time access patterns. In this scenario, the focus is on optimizing input/output execution performance, and using TFRecords in Cloud Storage aligns well with that goal.
upvoted 1 times
...
Voyager2
1 year, 11 months ago
Selected Answer: A
A. Load the data into BigQuery, and read the data from BigQuery. https://cloud.google.com/blog/products/ai-machine-learning/tensorflow-enterprise-makes-accessing-data-on-google-cloud-faster-and-easier Precisely on this link provided in other comments it whos that the best shot with tfrecords is: 18752 Records per second. In the same report it shows that bigquery is morethan 40000 recors per second
upvoted 2 times
tavva_prudhvi
1 year, 9 months ago
BigQuery is designed for running large-scale analytical queries, not for serving input pipelines for machine learning models like TensorFlow. BigQuery's strength is in its ability to handle complex queries over vast amounts of data, but it may not provide the optimal performance for the specific task of feeding data into a TensorFlow model. On the other hand, converting the CSV files into shards of TFRecords and storing them in Cloud Storage (Option C) will provide better performance because TFRecords is a format designed specifically for TensorFlow. It allows for efficient storage and retrieval of data, making it a more suitable choice for improving the input/output execution performance. Additionally, Cloud Storage provides high throughput and low-latency data access, which is beneficial for training large-scale TensorFlow models.
upvoted 3 times
...
...
M25
2 years ago
Selected Answer: C
Went with C
upvoted 2 times
...
shankalman717
2 years, 2 months ago
Selected Answer: C
Cloud Bigtable is typically used to process unstructured data, such as time-series data, logs, or other types of data that do not conform to a fixed schema. However, Cloud Bigtable can also be used to store structured data if necessary, such as in the case of a key-value store or a database that does not require complex relational queries.
upvoted 1 times
...
shankalman717
2 years, 2 months ago
Selected Answer: C
Option C, converting the CSV files into shards of TFRecords and storing the data in Cloud Storage, is the most appropriate solution for improving input/output execution performance in this scenario
upvoted 1 times
...
behzadsw
2 years, 4 months ago
Selected Answer: A
https://cloud.google.com/architecture/ml-on-gcp-best-practices#store-tabular-data-in-bigquery BigQuery for structured data, cloud storage for unstructed data
upvoted 4 times
ShePiDai
1 year, 11 months ago
agree. BigQuery and Cloud Storage have effectively identical storage performance, where BigQuery is optimised for structured dataset and GCS for unstructured.
upvoted 1 times
...
...
Mohamed_Mossad
2 years, 11 months ago
Selected Answer: D
"100 billion records stored in several CSV files" that means we deal with distributed big data problem , so HDFS is very suitable , Will choose D
upvoted 1 times
hoai_nam_1512
2 years, 8 months ago
HDFS will require more resources 100 bil record is processed fine with Cloud Storage object
upvoted 2 times
...
...

Topic 1 Question 46

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 46 discussion

As the lead ML Engineer for your company, you are responsible for building ML models to digitize scanned customer forms. You have developed a TensorFlow model that converts the scanned images into text and stores them in Cloud Storage. You need to use your ML model on the aggregated data collected at the end of each day with minimal manual intervention. What should you do?

  • A. Use the batch prediction functionality of AI Platform.
  • B. Create a serving pipeline in Compute Engine for prediction.
  • C. Use Cloud Functions for prediction each time a new data point is ingested.
  • D. Deploy the model on AI Platform and create a version of it for online inference.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Paul_Dirac
Highly Voted 3 years, 10 months ago
Use the model at the end of the day => Not D, C. Minimize manual intervention => not B Ans: A
upvoted 30 times
...
mayankblitzster
Most Recent 1 month, 3 weeks ago
Selected Answer: A
You need to run your TensorFlow model on aggregated data collected daily, with minimal manual intervention. This is a classic use case for batch prediction, which is: Automated: Can be scheduled or triggered via pipeline. Scalable: Handles large volumes of data efficiently. Serverless: No need to manage infrastructure. Integrated: Works seamlessly with models deployed on AI Platform (Vertex AI).
upvoted 1 times
...
PhilipKoku
11 months, 1 week ago
Selected Answer: A
A) This a batch prediction using AI Platform
upvoted 1 times
...
Arthurious
1 year, 1 month ago
Selected Answer: A
A is the most efficient
upvoted 1 times
...
Sum_Sum
1 year, 5 months ago
Selected Answer: A
A is the only way
upvoted 1 times
...
M25
2 years ago
Selected Answer: A
Went with A
upvoted 1 times
...
ares81
2 years, 4 months ago
Selected Answer: A
There is only A, for me.
upvoted 1 times
...
koakande
2 years, 4 months ago
Selected Answer: A
Because aggregated data can be sent at the end of the day for batch prediction and AI platform is managed so satisfy minimal intervention requirement Not B as violates minimal intervention requirement Not C and D as real-time or online inference is not needed since data is aggregated at the end of the day
upvoted 3 times
...
hiromi
2 years, 5 months ago
Selected Answer: A
You need to use your ML model on the aggregated data collected at the end of each day with minimal manual intervention.
upvoted 1 times
...
seifou
2 years, 5 months ago
A. https://datatonic.com/insights/vertex-ai-improving-debugging-batch-prediction/#:~:text=Vertex%20AI%20Batch%20Prediction%20provides,to%20GCS%20or%20BigQuery%2C%20respectively.
upvoted 1 times
...
Mohamed_Mossad
2 years, 11 months ago
Selected Answer: A
"You need to use your ML model on the aggregated data" that means we need the batch prediction feature in AI platform
upvoted 1 times
...
ggorzki
3 years, 3 months ago
Selected Answer: A
A https://cloud.google.com/ai-platform/prediction/docs/batch-predict
upvoted 3 times
...
george_ognyanov
3 years, 7 months ago
Another vote for A. Technically, through the right lens D could be correct as well, but what tipped me towards A was batch vs online predictions and the need for less manual work.
upvoted 3 times
...
Y2Data
3 years, 7 months ago
https://cloud.google.com/ai-platform/prediction/docs/batch-predict
upvoted 2 times
...

Topic 1 Question 47

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 47 discussion

You recently joined an enterprise-scale company that has thousands of datasets. You know that there are accurate descriptions for each table in BigQuery, and you are searching for the proper BigQuery table to use for a model you are building on AI Platform. How should you find the data that you need?

  • A. Use Data Catalog to search the BigQuery datasets by using keywords in the table description.
  • B. Tag each of your model and version resources on AI Platform with the name of the BigQuery table that was used for training.
  • C. Maintain a lookup table in BigQuery that maps the table descriptions to the table ID. Query the lookup table to find the correct table ID for the data that you need.
  • D. Execute a query in BigQuery to retrieve all the existing table names in your project using the INFORMATION_SCHEMA metadata tables that are native to BigQuery. Use the result o find the table that you need.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
chohan
Highly Voted 4 years, 4 months ago
Should be A https://cloud.google.com/data-catalog/docs/concepts/overview
upvoted 19 times
...
mmona19
Highly Voted 3 years, 7 months ago
Selected Answer: A
who is providing these answers?? Its clearly A. most of the answers are incorrect here.
upvoted 7 times
...
mayankblitzster
Most Recent 1 month, 3 weeks ago
Selected Answer: A
Use Data Catalog: Go to the Google Cloud Console. Navigate to Data Catalog. Use the search bar to enter keywords related to your model (e.g., “user behavior”, “clickstream”, “booking history”). Review the table descriptions and schemas to identify the most relevant dataset. Once identified, you can preview the data directly or query it in BigQuery. If Data Catalog isn’t enabled, you can also: Use the BigQuery Explorer in the Cloud Console. Search by keyword in the search bar at the top. Browse datasets manually if needed.
upvoted 2 times
...
Fer660
2 months, 2 weeks ago
Selected Answer: A
It is A, but dataCatalog is now deprecated and this question could likely be removed from the test set.
upvoted 1 times
...
louisaok
1 year ago
Selected Answer: A
A is the right one
upvoted 1 times
...
PhilipKoku
1 year, 5 months ago
Selected Answer: A
A) Data Catalog
upvoted 1 times
...
fragkris
1 year, 11 months ago
Selected Answer: A
A without hesitation.
upvoted 1 times
...
Sum_Sum
1 year, 12 months ago
Selected Answer: A
A is the only way
upvoted 1 times
...
SamuelTsch
2 years, 4 months ago
Selected Answer: A
A should be correct
upvoted 1 times
...
M25
2 years, 6 months ago
Selected Answer: A
Went with A
upvoted 2 times
...
TheGrew
3 years, 8 months ago
Selected Answer: A
Another vote for A by me.
upvoted 1 times
...
NamitSehgal
3 years, 10 months ago
Selected Answer: A
A should be the way to go for large datasets --This is also good but it is legacy way of checking:- NFORMATION_SCHEMA contains these views for table metadata: TABLES and TABLE_OPTIONS for metadata about tables. COLUMNS and COLUMN_FIELD_PATHS for metadata about columns and fields. PARTITIONS for metadata about table partitions (Preview)
upvoted 3 times
...
JobQ
3 years, 10 months ago
I vote A
upvoted 1 times
...
george_ognyanov
4 years, 1 month ago
Another vote for answer A from me.
upvoted 1 times
...

Topic 1 Question 48

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 48 discussion

You started working on a classification problem with time series data and achieved an area under the receiver operating characteristic curve (AUC ROC) value of
99% for training data after just a few experiments. You haven't explored using any sophisticated algorithms or spent any time on hyperparameter tuning. What should your next step be to identify and fix the problem?

  • A. Address the model overfitting by using a less complex algorithm.
  • B. Address data leakage by applying nested cross-validation during model training.
  • C. Address data leakage by removing features highly correlated with the target value.
  • D. Address the model overfitting by tuning the hyperparameters to reduce the AUC ROC value.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Paul_Dirac
Highly Voted 4 years, 4 months ago
Ans: B (Ref: https://towardsdatascience.com/time-series-nested-cross-validation-76adba623eb9) (C) High correlation doesn't mean leakage. The question may suggest target leakage and the defining point of this leakage is the availability of data after the target is available.(https://www.kaggle.com/dansbecker/data-leakage)
upvoted 28 times
Jarek7
2 years, 4 months ago
This ref doesn't explain WHY we should use NCV in this case - it just explains HOW to use NCV when dealing with time series. Cross-validation, including nested cross-validation, is a powerful tool for model evaluation and hyperparameter tuning, but it does NOT DIRECTLY ADDRESS data leakage. Data leakage refers to a situation where information from the test dataset leaks into the training dataset, causing the model to have an unrealistically high performance. Nested cross-validation can indeed help provide a more accurate estimation of the model's performance on unseen data, but IT DOESN'T SOLVE the underlying issue of data leakage if it's already present.
upvoted 6 times
...
...
John_Pongthorn
Highly Voted 2 years, 8 months ago
Selected Answer: C
C: this is correct choice 1000000000% This is data leakage issue on training data https://cloud.google.com/automl-tables/docs/train#analyze The question is from this content. If a column's Correlation with Target value is high, make sure that is expected, and not an indication of target leakage. Let 's explain on my owner way, sometime the feature used on training data use value to calculate something from target value unintentionally, it result in high correlation with each other. for instance , you predict stock price by using moving average, MACD , RSI despite the fact that 3 features have been calculated from price (target).
upvoted 8 times
black_scissors
2 years, 5 months ago
I agree. Besides, when a CV is done randomly (not split by the time point) it can make things worse.
upvoted 2 times
...
...
mayankblitzster
Most Recent 1 month, 3 weeks ago
Selected Answer: C
C. Address data leakage by removing features highly correlated with the target value.Why? Achieving 99% AUC ROC on training data with minimal effort is a red flag. The most likely cause is data leakage, where the model has access to information it shouldn't during training. Features that are highly correlated with the target often leak label information, leading to unrealistically high performance. ❌ Why not the others? A. Use a less complex algorithm: Overfitting might not be the issue here—your model is likely learning from leaked data, not overfitting due to complexity. B. Nested cross-validation: This helps with model selection and tuning, but it doesn't prevent leakage if the data itself is flawed. D. Tuning hyperparameters to reduce AUC ROC: You don’t tune to reduce performance; you tune to generalize better. High training AUC isn't the goal—generalization is.
upvoted 1 times
...
b7ad1d9
1 month, 3 weeks ago
Selected Answer: B
Strangely high ROC => overly low bias/high variance =>data leakage. The time series nature of the data also points to data leakage as data leakage is more common in time series datasets. Now only B reference a method to reduce data leakage, i.e. NEsted Cross Validation which runs validation on multiple slices of data with different hyperparams (if I understood it correctly). Option C talks about removing highly correlated feature, but there isnt enough info that THAT is what is leading to data leakage. If the correlation is due to leakage , the correct fix is to re-engineer or remove the leaky component, not just remove every highly correlated feature.
upvoted 1 times
...
Fer660
2 months, 2 weeks ago
Selected Answer: A
Frankly, none of the answers is fully acceptable. Certainly not C -- if we remove correlated features we are removing information from the data. Certainly not D -- if we tune to reduce AUROC, we are working against our own interest. Issues with B are described below by others. I thought A could be reasonable, except that the statement already tells us that we are not using a particularly sophisticated model. This question should be flagged for review by examtopics.
upvoted 1 times
...
Sivaram06
10 months ago
Selected Answer: B
Gemini Explanation: Nested Cross-validation: Nested cross-validation is a robust technique to detect and mitigate data leakage. It involves two loops of cross-validation: Inner loop: Tunes hyperparameters and performs model selection. Outer loop: Evaluates the model's performance on unseen data, giving you a more realistic estimate of how well your model generalizes. Why not C : C. Address data leakage by removing features highly correlated with the target value: While highly correlated features can sometimes be a sign of leakage, they might also be genuinely informative features. Removing them without proper analysis might hurt your model's performance.
upvoted 2 times
...
Pau1234
11 months, 2 weeks ago
Selected Answer: B
As per the PMLE cert book the answer is B. Since the model is performing well with training data, it is a case of data leakage. Cross‐validation is one of the strategies to overcome data leakage.
upvoted 2 times
desertlotus1211
10 months, 2 weeks ago
The book mentions cross-validation.. NOT 'nested cross-validation' (page 34), however this answer is better than C. you want to remove value that are NOT correlated versus correlated as in answer C. ;)
upvoted 1 times
...
...
Foxy2021
1 year ago
Select answer: C. --reason--- While B (nested cross-validation) helps improve the evaluation process and prevents over-optimistic performance estimates, it doesn't tackle the root cause of data leakage. Data leakage is often caused by features that are too closely tied to the target—in this case, the unusually high AUC suggests that the model is gaining unfair information.
upvoted 2 times
...
chirag2506
1 year, 4 months ago
Selected Answer: B
B is the correct option
upvoted 1 times
...
PhilipKoku
1 year, 5 months ago
Selected Answer: C
C) Is the best answer
upvoted 1 times
...
girgu
1 year, 5 months ago
Selected Answer: C
Nested cross validation will not work for time series data. Time series data require the expanding widow training data set. Seems most likely the issue is high correlation in columns.
upvoted 1 times
...
AnnaR
1 year, 6 months ago
B: correct. considering c, but why should we remove a feature of highly predictive nature?? for me, this does not explain the problem of overfitting... a highly predictive feature is also useful for good performance evaluated on the test set. --> Decide for B!
upvoted 2 times
...
gscharly
1 year, 6 months ago
Selected Answer: B
agree with Paul_Dirac
upvoted 1 times
...
b1a8fae
1 year, 10 months ago
Selected Answer: B
I initially went with B- however after reading this: https://machinelearningmastery.com/nested-cross-validation-for-machine-learning-with-python/ I think C is right. Quoted from the link: "Nested cross-validation is an approach to model hyperparameter optimization and model selection that attempts to overcome the problem of overfitting the training dataset.". Overfitting is exactly our problem here. Correlated features in the dataset may be a sign of data leakage, but they are not necessarily.
upvoted 1 times
...
Sum_Sum
1 year, 12 months ago
Selected Answer: B
I think its B. GPT4 makes a good argument about C: While this is a valid approach to handling data leakage, it might not be sufficient if the leakage is due to reasons other than high correlation, such as temporal leakage in time-series data.
upvoted 1 times
...
pico
2 years, 1 month ago
Selected Answer: A
Option A: This option is a reasonable choice. Switching to a less complex algorithm can help reduce overfitting, and using k-fold cross-validation can provide a better estimate of how well the model will generalize to unseen data. It's essential to ensure that the high performance isn't solely due to overfitting.
upvoted 1 times
pico
2 years, 1 month ago
Option B: Nested cross-validation is primarily used to estimate model performance accurately and select the best model hyperparameters. While it's a good practice, it doesn't directly address the overfitting issue. It helps prevent over-optimistic model performance estimates but doesn't necessarily fix the overfitting problem. Option C: Removing features highly correlated with the target value can be a valid step in feature selection or preprocessing. However, it doesn't directly address the overfitting issue or explain why the model is performing exceptionally well on the training data. It's a separate step from mitigating overfitting. Option D: This option is incorrect. Tuning hyperparameters should aim to improve model performance on the validation set, not reduce it. In summary, the most appropriate next step is Option A:
upvoted 3 times
...
...
atlas_lyon
2 years, 2 months ago
Selected Answer: B
B: If splits are done chronologically(as it is always advised), Nested CV should work C: High correlation with target means we have to check if this is strong explanatory power or data leakage. dropping the features won't help us distinguish in those cases but may help reveal independence contribution of remaining features
upvoted 1 times
...

Topic 1 Question 49

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 49 discussion

You work for an online travel agency that also sells advertising placements on its website to other companies. You have been asked to predict the most relevant web banner that a user should see next. Security is important to your company. The model latency requirements are 300ms@p99, the inventory is thousands of web banners, and your exploratory analysis has shown that navigation context is a good predictor. You want to Implement the simplest solution. How should you configure the prediction pipeline?

  • A. Embed the client on the website, and then deploy the model on AI Platform Prediction.
  • B. Embed the client on the website, deploy the gateway on App Engine, and then deploy the model on AI Platform Prediction.
  • C. Embed the client on the website, deploy the gateway on App Engine, deploy the database on Cloud Bigtable for writing and for reading the user's navigation context, and then deploy the model on AI Platform Prediction.
  • D. Embed the client on the website, deploy the gateway on App Engine, deploy the database on Memorystore for writing and for reading the user's navigation context, and then deploy the model on Google Kubernetes Engine.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Paul_Dirac
Highly Voted 4 years, 3 months ago
Security => not A. B: doesn't handle processing with banner inventory. D: deployment on GKE is less simple than on AI Platform. Besides, MemoryStore is in-memory while banners are stored persistently. Ans: C
upvoted 13 times
pinimichele01
1 year, 6 months ago
B: doesn't handle processing with banner inventory ---> not true...
upvoted 2 times
...
...
Celia20210714
Highly Voted 4 years, 3 months ago
ANS: C GAE + IAP https://medium.com/google-cloud/secure-cloud-run-cloud-functions-and-app-engine-with-api-key-73c57bededd1 Bigtable at low latency https://cloud.google.com/bigtable#section-2
upvoted 8 times
...
mayankblitzster
Most Recent 1 month, 3 weeks ago
Selected Answer: C
Client Embedded on Website Captures user navigation context in real time. Sends context data securely to the backend via HTTPS. Gateway on App Engine Acts as a secure and scalable entry point. Handles authentication, rate limiting, and routing. Can preprocess or validate navigation context before passing it on. Database on Cloud Bigtable Stores and retrieves user navigation context efficiently. Optimized for high-throughput, low-latency reads/writes. Scales horizontally to support thousands of concurrent users. Model on AI Platform Prediction (now Vertex AI) Hosts your trained model for inference. Automatically scales to meet latency requirements (300ms @ p99). Integrates with IAM and VPC-SC for secure access. Supports TensorFlow, PyTorch, and scikit-learn models.
upvoted 2 times
...
AB_C
11 months, 2 weeks ago
Selected Answer: B
B - right answer
upvoted 1 times
...
ccb23cc
1 year, 4 months ago
Selected Answer: C
They affirm that navigation context is a good predictor for your model. Therefore you need to be able to perform the prediction and write the new context (if you get more data you will get a better model) and read (to use it for your prediction). On one hand, BigQuery is a OLAP method so for writings and readings could take it around 2 seconds. On the other hand, BigTable is a OLTP method and can make writings and readings in about 9 milliseconds Conclusion: As one of the requerements is that the latency requirements have to be below 300ms your only choice is using BigTable https://galvarado.com.mx/post/comparaci%C3%B3n-de-bases-de-datos-en-google-cloud-datastore-vs-bigtable-vs-cloud-sql-vs-spanner-vs-bigquery/
upvoted 3 times
...
PhilipKoku
1 year, 5 months ago
Selected Answer: C
C) Big Table for low latency
upvoted 2 times
...
AnnaR
1 year, 6 months ago
Selected Answer: B
Was torn between B and C, but decided for B, because the question states how we should configure the PREDICTION pipeline! Since the exploratory analysis already identified navigation context as good predictor, the focus should be on the prediction model itself.
upvoted 4 times
...
gscharly
1 year, 6 months ago
Selected Answer: C
agree with Paul_Dirac
upvoted 2 times
...
rightcd
1 year, 8 months ago
look at Q80
upvoted 3 times
...
Sum_Sum
1 year, 12 months ago
Selected Answer: B
I was torn between B and C. But I really don't see the need for a DB
upvoted 3 times
Fer660
2 months, 2 weeks ago
I agree with this logic. A limited navigation context would not require a DB. Of course, if we need to look back hundreds of datapoints related to the navigation context we would use a DB, but the question does not seem to push in that direction. Alas, we all have to guess a bit of what that 'context' really means.
upvoted 1 times
...
...
Mickey321
1 year, 12 months ago
Selected Answer: B
Embed the client on the website, deploy the gateway on App Engine, and then deploy the model on AI Platform Prediction.
upvoted 1 times
...
harithacML
2 years, 4 months ago
Selected Answer: B
secuirity (gateway) + Simplest(ai, not DB)
upvoted 1 times
...
Liting
2 years, 4 months ago
Selected Answer: C
Bigtable is recommended for storage in the case scenario.
upvoted 2 times
...
tavva_prudhvi
2 years, 4 months ago
Selected Answer: C
B is also a possible solution, but it does not include a database for storing and retrieving the user's navigation context. This means that every time a user visits a page, the gateway would need to query the website to retrieve the navigation context, which could be slow and inefficient. By using Cloud Bigtable to store the navigation context, the gateway can quickly retrieve the context from the database and pass it to the model for prediction. This makes the overall prediction pipeline more efficient and scalable. Therefore, C is a better option compared to B.
upvoted 6 times
...
friedi
2 years, 4 months ago
Selected Answer: B
B is correct, C introduces computational overhead, unnecessarily increasing serving latency.
upvoted 1 times
...
Voyager2
2 years, 5 months ago
Selected Answer: C
C. Embed the client on the website, deploy the gateway on App Engine, deploy the database on Cloud Bigtable for writing and for reading the user's navigation context, and then deploy the model on AI Platform Prediction https://cloud.google.com/architecture/minimizing-predictive-serving-latency-in-machine-learning#choosing_a_nosql_database Typical use cases for Bigtable are: * Ad prediction that leverages dynamically aggregated values over all ad requests and historical data.
upvoted 2 times
...
CloudKida
2 years, 6 months ago
Selected Answer: C
Bigtable is a massively scalable NoSQL database service engineered for high throughput and for low-latency workloads. It can handle petabytes of data, with millions of reads and writes per second at a latency that's on the order of milliseconds. Typical use cases for Bigtable are: Fraud detection that leverages dynamically aggregated values. Applications in Fintech and Adtech are usually subject to heavy reads and writes. Ad prediction that leverages dynamically aggregated values over all ad requests and historical data. Booking recommendation based on the overall customer base's recent bookings.
upvoted 2 times
...

Topic 1 Question 50

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 50 discussion

Your team is building a convolutional neural network (CNN)-based architecture from scratch. The preliminary experiments running on your on-premises CPU-only infrastructure were encouraging, but have slow convergence. You have been asked to speed up model training to reduce time-to-market. You want to experiment with virtual machines (VMs) on Google Cloud to leverage more powerful hardware. Your code does not include any manual device placement and has not been wrapped in Estimator model-level abstraction. Which environment should you train your model on?

  • A. AVM on Compute Engine and 1 TPU with all dependencies installed manually.
  • B. AVM on Compute Engine and 8 GPUs with all dependencies installed manually.
  • C. A Deep Learning VM with an n1-standard-2 machine and 1 GPU with all libraries pre-installed.
  • D. A Deep Learning VM with more powerful CPU e2-highcpu-16 machines with all libraries pre-installed.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
celia20200410
Highly Voted 4 years, 3 months ago
ANS: C to support CNN, you should use GPU. for preliminary experiment, pre-installed pkgs/libs are good choice. https://cloud.google.com/deep-learning-vm/docs/cli#creating_an_instance_with_one_or_more_gpus https://cloud.google.com/deep-learning-vm/docs/introduction#pre-installed_packages
upvoted 18 times
...
Paul_Dirac
Highly Voted 4 years, 3 months ago
Code without manual device placement => default to CPU if TPU is present or to the lowest order GPU if multiple GPUs are present. => Not A, B. D: already using CPU and needing GPU for CNN. Ans: C
upvoted 15 times
...
mayankblitzster
Most Recent 1 month, 3 weeks ago
Selected Answer: C
Choose a GPU type: Start with NVIDIA T4 for cost-efficiency or A100 for high performance. Spin up a Deep Learning VM: Use the Google Cloud Console or gcloud CLI. Upload your code and data: You can use Cloud Storage buckets or SCP. Train and monitor: Use TensorBoard or logging to monitor convergence.
upvoted 1 times
...
Fer660
2 months, 2 weeks ago
Selected Answer: C
Not A: the model may not be compatible with TPU -- there are specific requirements for this. Not B: 8 GPUs seems overkill as next step. Not D: GPU will very likely perform better than CPU C: Good idea to try some GPU and observe the speedup. Try first with pre-installed libraries for easier experimentation -- trim down or specialize later if needed.
upvoted 1 times
...
RyanTan
8 months, 1 week ago
Selected Answer: A
C is wrong because n1‐standard‐2 is too small for GPUs.
upvoted 1 times
Begum
6 months, 1 week ago
The question says "you want to experiment with google VMs.." Start with base configs.. N1 family ( Gen purpose VMs are more flexible to attach GPU)
upvoted 1 times
...
...
IrribarraC
8 months, 2 weeks ago
Selected Answer: C
swapping CPU for GPU will speed up the training of a CNN a lot. Using preinstalled librearies is incurring in less risks which means speeding up time-to-market
upvoted 1 times
...
Pau1234
11 months, 2 weeks ago
Selected Answer: A
Option A is better because it is better to go with 1 TPU than 8 GPUs, especially when you don't have any manual placements.
upvoted 2 times
...
PhilipKoku
1 year, 5 months ago
Selected Answer: C
C) GPU and all pre-installed libraries.
upvoted 1 times
...
gscharly
1 year, 6 months ago
Selected Answer: C
Agree with celia20200410 - C
upvoted 1 times
...
Sum_Sum
1 year, 12 months ago
Selected Answer: C
Agree with celia20200410 - C
upvoted 2 times
...
Mickey321
1 year, 12 months ago
Selected Answer: D
keyword: Your code does not include any manual device placement and has not been wrapped in Estimator model-level abstraction.
upvoted 1 times
...
Liting
2 years, 4 months ago
Selected Answer: C
Should use the deep learning VM with GPU. TPU should be selected only if necessary, coz it incurs high cost. GPU in this case is enough.
upvoted 1 times
...
M25
2 years, 6 months ago
Selected Answer: C
Went with C
upvoted 1 times
...
Melampos
2 years, 6 months ago
Selected Answer: A
thinking in fastest way
upvoted 1 times
...
SergioRubiano
2 years, 7 months ago
Selected Answer: C
You should use GPU.
upvoted 1 times
...
BenMS
2 years, 8 months ago
Selected Answer: D
Critical sentence: Your code does not include any manual device placement and has not been wrapped in Estimator model-level abstraction. So only answer we have. it's D.
upvoted 3 times
...
shankalman717
2 years, 8 months ago
Critical sentece: Your code does not include any manual device placement and has not been wrapped in Estimator model-level abstraction. So only answer we have. it's D.
upvoted 3 times
tavva_prudhvi
2 years, 4 months ago
Option D provides a more powerful CPU but does not include a GPU, which may not be optimal for deep learning training.
upvoted 2 times
...
...

Topic 1 Question 51

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 51 discussion

You work on a growing team of more than 50 data scientists who all use AI Platform. You are designing a strategy to organize your jobs, models, and versions in a clean and scalable way. Which strategy should you choose?

  • A. Set up restrictive IAM permissions on the AI Platform notebooks so that only a single user or group can access a given instance.
  • B. Separate each data scientist's work into a different project to ensure that the jobs, models, and versions created by each data scientist are accessible only to that user.
  • C. Use labels to organize resources into descriptive categories. Apply a label to each created resource so that users can filter the results by label when viewing or monitoring the resources.
  • D. Set up a BigQuery sink for Cloud Logging logs that is appropriately filtered to capture information about AI Platform resource usage. In BigQuery, create a SQL view that maps users to the resources they are using
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
chohan
Highly Voted 4 years, 4 months ago
I think should be C, As IAM roles are given to the entire AI Notebook resource, not to a specific instance.
upvoted 14 times
...
celia20200410
Highly Voted 4 years, 3 months ago
ans: c https://cloud.google.com/ai-platform/prediction/docs/resource-labels#overview_of_labels You can add labels to your AI Platform Prediction jobs, models, and model versions, then use those labels to organize resources into categories when viewing or monitoring the resources. For example, you can label jobs by team (such as engineering or research) and development phase (prod or test), then filter the jobs based on the team and phase. Labels are also available on operations, but these labels are derived from the resource to which the operation applies. You cannot add or update labels on an operation. A label is a key-value pair, where both the key and the value are custom strings that you supp
upvoted 12 times
vivid_cucumber
3 years, 12 months ago
I read through this page: https://cloud.google.com/ai-platform/prediction/docs/sharing-models. This one sounds more like A. Is isn't that correct? I am not quite sure.
upvoted 1 times
vivid_cucumber
3 years, 12 months ago
or maybe A is not correct because "sharing models using IAM" only applies to "manage access to resource" but this question is more like asking to "organize jobs, models, and versions". not sure if my understanding is right or not.
upvoted 1 times
...
...
...
furix
Most Recent 1 year, 2 months ago
B. Setting up different resources in separate projects can help separate the use of resources. From the official guide book
upvoted 2 times
desertlotus1211
1 year ago
Creating separate projects for each data scientist would lead to significant overhead in managing resources and permissions across numerous projects, making it harder to scale and collaborate.
upvoted 2 times
...
desertlotus1211
1 year ago
I thought the same, but... se my answer below
upvoted 1 times
...
...
PhilipKoku
1 year, 5 months ago
Selected Answer: C
C) labels
upvoted 1 times
...
Sum_Sum
1 year, 12 months ago
C Although there are some questions where setting up a logging sink to BQ is the answer.
upvoted 1 times
...
M25
2 years, 6 months ago
Selected Answer: C
Went with C
upvoted 1 times
...
BenMS
2 years, 8 months ago
Selected Answer: C
Restricting access is not scalable and creates silos - better to document sharable resources through tagging, hence C.
upvoted 1 times
...
hiromi
2 years, 11 months ago
Selected Answer: C
C Resource tagging/labeling is the best way to manage ML resources for medium/big data science teams.
upvoted 1 times
...
ggorzki
3 years, 9 months ago
Selected Answer: C
https://cloud.google.com/ai-platform/prediction/docs/resource-labels#overview_of_labels (A) applies only to notebooks wich is not enough
upvoted 4 times
...

Topic 1 Question 52

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 52 discussion

You are training a deep learning model for semantic image segmentation with reduced training time. While using a Deep Learning VM Image, you receive the following error: The resource 'projects/deeplearning-platforn/zones/europe-west4-c/acceleratorTypes/nvidia-tesla-k80' was not found. What should you do?

  • A. Ensure that you have GPU quota in the selected region.
  • B. Ensure that the required GPU is available in the selected region.
  • C. Ensure that you have preemptible GPU quota in the selected region.
  • D. Ensure that the selected GPU has enough GPU memory for the workload.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
celia20200410
Highly Voted 4 years, 3 months ago
ANS: B https://cloud.google.com/deep-learning-vm/docs/troubleshooting#resource_not_found https://cloud.google.com/compute/docs/gpus/gpu-regions-zones Resource not found Symptom: - The resource 'projects/deeplearning-platform/zones/europe-west4-c/acceleratorTypes/nvidia-tesla-k80' was not found Problem: You are trying to create an instance with one or more GPUs in a region where GPUs are not available (for example, an instance with a K80 GPU in europe-west4-c). Solution: To determine which region has the required GPU, see GPUs on Compute Engine.
upvoted 24 times
...
stomcarlo
Highly Voted 4 years, 5 months ago
it is B, the error message relates to Quota is different: https://cloud.google.com/deep-learning-vm/docs/troubleshooting#resource_not_found
upvoted 10 times
...
PhilipKoku
Most Recent 1 year, 5 months ago
Selected Answer: B
B) GPUs are only available in specific regions and zones
upvoted 1 times
...
fragkris
1 year, 11 months ago
Selected Answer: B
Not all resources can be found in any region. Therefore - B
upvoted 1 times
...
abhay669
1 year, 11 months ago
Selected Answer: B
It is clearly mentioned here: https://cloud.google.com/deep-learning-vm/docs/troubleshooting
upvoted 1 times
...
Sum_Sum
1 year, 12 months ago
Selected Answer: B
B - because it's "cant be found"
upvoted 1 times
...
M25
2 years, 6 months ago
Selected Answer: B
Went with B
upvoted 1 times
...
BenMS
2 years, 8 months ago
Selected Answer: B
The error says the resource was not found - hence B. If quota was the problem (A) then you'd see a different error message.
upvoted 2 times
...
hiromi
2 years, 11 months ago
Selected Answer: B
B obviously
upvoted 2 times
...
_luigi_
3 years, 6 months ago
Selected Answer: B
The resource is not found because it doesn't exist in the region.
upvoted 3 times
...
mmona19
3 years, 7 months ago
Selected Answer: A
the question is asking what should you do not why is the error. Answer should be A. if you get that exception, make sure to check your limit for instance before running the job.
upvoted 1 times
desertlotus1211
1 year ago
wrong - its a resource availability error.
upvoted 1 times
...
...
ggorzki
3 years, 9 months ago
Selected Answer: B
https://cloud.google.com/deep-learning-vm/docs/troubleshooting#resource_not_found
upvoted 2 times
...

Topic 1 Question 53

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 53 discussion

Your team is working on an NLP research project to predict political affiliation of authors based on articles they have written. You have a large training dataset that is structured like this:

You followed the standard 80%-10%-10% data distribution across the training, testing, and evaluation subsets. How should you distribute the training examples across the train-test-eval subsets while maintaining the 80-10-10 proportion?

  • A. Distribute texts randomly across the train-test-eval subsets: Train set: [TextA1, TextB2, ...] Test set: [TextA2, TextC1, TextD2, ...] Eval set: [TextB1, TextC2, TextD1, ...]
  • B. Distribute authors randomly across the train-test-eval subsets: (*) Train set: [TextA1, TextA2, TextD1, TextD2, ...] Test set: [TextB1, TextB2, ...] Eval set: [TexC1,TextC2 ...]
  • C. Distribute sentences randomly across the train-test-eval subsets: Train set: [SentenceA11, SentenceA21, SentenceB11, SentenceB21, SentenceC11, SentenceD21 ...] Test set: [SentenceA12, SentenceA22, SentenceB12, SentenceC22, SentenceC12, SentenceD22 ...] Eval set: [SentenceA13, SentenceA23, SentenceB13, SentenceC23, SentenceC13, SentenceD31 ...]
  • D. Distribute paragraphs of texts (i.e., chunks of consecutive sentences) across the train-test-eval subsets: Train set: [SentenceA11, SentenceA12, SentenceD11, SentenceD12 ...] Test set: [SentenceA13, SentenceB13, SentenceB21, SentenceD23, SentenceC12, SentenceD13 ...] Eval set: [SentenceA11, SentenceA22, SentenceB13, SentenceD22, SentenceC23, SentenceD11 ...]
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
rc380
Highly Voted 4 years, 2 months ago
I think since we are predicting political leaning of authors, perhaps distributing authors make more sense? (B)
upvoted 22 times
jk73
4 years, 1 month ago
Exactly! I also consider is B Check this out! If we just put inside the Training set , Validation set and Test set , randomly Text, Paragraph or sentences the model will have the ability to learn specific qualities about The Author's use of language beyond just his own articles. Therefore the model will mixed up different opinions. Rather if we divided things up a the author level, so that given authors were only on the training data, or only in the test data or only in the validation data. The model will find more difficult to get a high accuracy on the test validation (What is correct and have more sense!). Because it will need to really focus in author by author articles rather than get a single political affiliation based on a bunch of mixed articles from different authors. https://developers.google.com/machine-learning/crash-course/18th-century-literature
upvoted 13 times
...
sensev
4 years, 2 months ago
Agree it should be B. Since every author has his/her distinct style, splitting different text from the same author across different set could result in data label leakage.
upvoted 10 times
dxxdd7
4 years, 2 months ago
I don't agree as we want to know the political affiliation from a text and not based on an author. I think A is better
upvoted 3 times
jk73
4 years, 1 month ago
it is the political affiliation from a text, but to whom belong that text? The statement clearly says ... Predict political affiliation of authors based on articles they have written. Hence the political affiliation is for each author according to the text he wrote.
upvoted 4 times
...
...
...
...
inder0007
Highly Voted 4 years, 4 months ago
Should be A, we are trying to get a label on the entire text so only A makes sense
upvoted 8 times
GogoG
4 years, 1 month ago
Correct answer is B - https://developers.google.com/machine-learning/crash-course/18th-century-literature
upvoted 5 times
Dunnoth
2 years, 8 months ago
This is a known study. if you use A, the moment a new author is given in a test set the accuracy is waay low than what your metrics might suggest. To have realistic evaluation results it should be B. Also note that the label is for the "authour" not a text.
upvoted 1 times
...
...
...
chibuzorrr
Most Recent 11 months, 1 week ago
Selected Answer: A
I think A. The B option training set would not contain text from authors supporting party B
upvoted 2 times
...
PhilipKoku
1 year, 5 months ago
Selected Answer: B
B) Authors
upvoted 1 times
...
girgu
1 year, 5 months ago
Selected Answer: B
We have divide / split at author level. Other wise model will used text to author relationship but we want to find text to political affiliation relation ship. While prediction we already know text to author relation but we want to find text to political relation (and therefore author to political relation is implied.
upvoted 1 times
...
tavva_prudhvi
2 years, 4 months ago
Selected Answer: B
This is the best approach as it ensures that the data is distributed in a way that is representative of the overall population. By randomly distributing authors across the subsets, we ensure that each subset has a similar distribution of political affiliations. This helps to minimize bias and increases the likelihood that our model will generalize well to new data. Distributing texts randomly or by sentences or paragraphs may result in subsets that are biased towards a particular political affiliation. This could lead to overfitting and poor generalization performance. Therefore, it is important to distribute the data in a way that maintains the overall distribution of political affiliations across the subsets.
upvoted 3 times
...
M25
2 years, 6 months ago
Selected Answer: B
Went with B
upvoted 1 times
...
John_Pongthorn
2 years, 8 months ago
Selected Answer: B
https://cloud.google.com/automl-tables/docs/prepare#split https://developers.google.com/machine-learning/crash-course/18th-century-literature
upvoted 1 times
...
enghabeth
2 years, 9 months ago
Selected Answer: B
Ans B The model is to predict which political party the author belongs to, not which political party the text belongs to... You do not have the information of the political party of each text, you are assuming that the texts are associated with the political party of the author.
upvoted 1 times
...
bL357A
3 years, 2 months ago
Selected Answer: A
label is party, feature is text
upvoted 2 times
...
suresh_vn
3 years, 3 months ago
IMO, B is correct A,C,D label leakaged
upvoted 1 times
...
ggorzki
3 years, 9 months ago
Selected Answer: B
https://developers.google.com/machine-learning/crash-course/18th-century-literature Split by authors, otherwise there will be data leakage - the model will get the ability to learn author specific use of language
upvoted 6 times
...
NamitSehgal
3 years, 10 months ago
B I agree
upvoted 1 times
...
JobQ
3 years, 10 months ago
I already saw the video in: https://developers.google.com/machine-learning/crash-course/18th-century-literature Based on this video I concluded that the answer is A. What answer B is saying is that you will have Author B's texts in the training set, Author A's texts in the testing set and Author C's texts in the validation set. According to the video B is incorrect. We want to have texts from author A in the training, testing and validation set. So A is correct. I think most people are choosing B because the word "author" but let's be careful.
upvoted 2 times
giaZ
3 years, 8 months ago
I though the same initially, but no..We'd want texts from author A in the training, testing and validation set if the task was to predict the author from a text (meaning, if the label was the author..right? You train the model to learn the style of text and connect it to an author. You'd need new texts from the same author in the test and validation sets, to see if the model is able to recognize him/her). HERE, the task is to predict political affiliation from a text of an author. The author is given. In the test and validation sets you need new authors, to see wether the model is able to guess their political affiliation. So you would do 80 authors (and corresponding texts) for training, 10 different authors for validation, and 10 different ones for test.
upvoted 6 times
...
...
pddddd
4 years, 1 month ago
Partition by author - there is an actual example in Coursera 'Production ML systems' course
upvoted 1 times
...
Macgogo
4 years, 1 month ago
I think it is B. -- Your test data includes data from populations that will not be represented in production. For example, suppose you are training a model with purchase data from a number of stores. You know, however, that the model will be used primarily to make predictions for stores that are not in the training data. To ensure that the model can generalize to unseen stores, you should segregate your data sets by stores. In other words, your test set should include only stores different from the evaluation set, and the evaluation set should include only stores different from the training set. https://cloud.google.com/automl-tables/docs/prepare#ml-use
upvoted 4 times
...
Danny2021
4 years, 2 months ago
Should be D. Please see the dataset provided, it is based on the text / paragraphs.
upvoted 1 times
george_ognyanov
4 years ago
Have a look at the link the other have already provided twice. Splitting sentence by sentence is literally mentioned in said video as a bad example and something we should not do in this case.
upvoted 1 times
...
...

Topic 1 Question 54

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 54 discussion

Your team has been tasked with creating an ML solution in Google Cloud to classify support requests for one of your platforms. You analyzed the requirements and decided to use TensorFlow to build the classifier so that you have full control of the model's code, serving, and deployment. You will use Kubeflow pipelines for the ML platform. To save time, you want to build on existing resources and use managed services instead of building a completely new model. How should you build the classifier?

  • A. Use the Natural Language API to classify support requests.
  • B. Use AutoML Natural Language to build the support requests classifier.
  • C. Use an established text classification model on AI Platform to perform transfer learning.
  • D. Use an established text classification model on AI Platform as-is to classify support requests.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
arbik
Highly Voted 3 years, 9 months ago
ANS: C as you want to have full control of the model code.
upvoted 29 times
...
Celia20210714
Highly Voted 3 years, 9 months ago
ANS: D https://cloud.google.com/ai-platform/training/docs/algorithms - to use TensorFlow - to build on existing resources - to use managed services
upvoted 11 times
george_ognyanov
3 years, 7 months ago
While D is very close for me, I think there are 2 giveaways here: "To save time, you want to build on existing resources" - transfer learning "instead of building a completely new model" - answer D leaves the model as is ANS C:
upvoted 3 times
...
ms_lemon
3 years, 7 months ago
the model cannot work as-is as the classes to predict will likely not be the same; we need to use transfer learning to retrain the last layer and adapt it to the classes we need, hence C
upvoted 7 times
...
...
PhilipKoku
Most Recent 11 months, 1 week ago
Selected Answer: C
C) Transfer learning
upvoted 1 times
...
M25
2 years ago
Selected Answer: C
Went with C
upvoted 3 times
...
Dunnoth
2 years, 2 months ago
Selected Answer: C
Usage of Tensorflow, can build a simple model by using a sentence embedding and a single layer classifier.
upvoted 1 times
...
enghabeth
2 years, 3 months ago
Selected Answer: D
you don't need transfer learning in this case
upvoted 1 times
...
Mohamed_Mossad
2 years, 11 months ago
Selected Answer: C
- "You analyzed the requirements and decided to use TensorFlow" this will make choices to reduce to C and D - " so that you have full control of the model's code " will make us choose C
upvoted 2 times
...
David_ml
3 years ago
Selected Answer: C
Answer is C.
upvoted 1 times
...
MasterMath
3 years ago
According to me it is B. A is not correct as it uses an API call only and we won't build the system on existing resources. C & D I do not see in AI Platform (Vertex AI) an established text classification that can be used. The B answer is the right one, you have the labeled data, you need to remove the custom TF code and build a classifier with AutoML Natural Language
upvoted 1 times
David_ml
3 years ago
B is wrong. question says " you have full control of the model's code". You don't have full control of automl code. The right answer is C.
upvoted 1 times
...
...
giaZ
3 years, 2 months ago
Selected Answer: C
"full control of the model's code, serving, and deployment": Not A nor B. and "you want to build on existing resources and use managed services": Not D (that's "as-is") You want transfer learning.
upvoted 5 times
...
NamitSehgal
3 years, 4 months ago
Cis correct
upvoted 1 times
...
george_ognyanov
3 years, 7 months ago
ANS: C according to me as well. As arbik said, full control, custom model are give aways.
upvoted 1 times
...

Topic 1 Question 55

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 55 discussion

You recently joined a machine learning team that will soon release a new project. As a lead on the project, you are asked to determine the production readiness of the ML components. The team has already tested features and data, model development, and infrastructure. Which additional readiness check should you recommend to the team?

  • A. Ensure that training is reproducible.
  • B. Ensure that all hyperparameters are tuned.
  • C. Ensure that model performance is monitored.
  • D. Ensure that feature expectations are captured in the schema.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
inder0007
Highly Voted 3 years, 10 months ago
I think it should be C
upvoted 22 times
simoncerda
3 years, 5 months ago
I also think is C: reference : https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/aad9f93b86b7addfea4c419b9100c6cdd26cacea.pdf
upvoted 1 times
...
omar_bh
3 years, 9 months ago
performance monitoring is a continuous effort that happens all time. but reproducibility makes more sense to be added to model QA
upvoted 4 times
sensev
3 years, 9 months ago
The question was not about model QA but production readiness, thus I think the answer is C because monitor model performance in production is important. As regard to A, I would I argue it could fall under "model development", since reproducible training is already important during model development.
upvoted 4 times
vivid_cucumber
3 years, 6 months ago
To my understanding, I think A might be correct since model performance monitoring is happens "in production". but the question said the project "will soon release" which means right now is before launching, so to me testing the reproducible would make more sense. (I was confused about A and C for a long time) reference: - Testing reproducibility: https://developers.google.com/machine-learning/testing-debugging/pipeline/deploying - Testing in Production: https://developers.google.com/machine-learning/testing-debugging/pipeline/production
upvoted 7 times
...
...
...
...
ralf_cc
Highly Voted 3 years, 10 months ago
A - important one before moving to the production
upvoted 9 times
salsabilsf
3 years, 9 months ago
Testing for Deploying Machine Learning Models: - Test Model Updates with Reproducible Training https://developers.google.com/machine-learning/testing-debugging/pipeline/deploying
upvoted 5 times
...
...
PhilipKoku
Most Recent 11 months, 1 week ago
Selected Answer: C
C) Model monitoring
upvoted 1 times
...
SahandJ
1 year ago
C is not a readiness check. Monitoring is a continuous effort. IMO A is the correct answer. If the training is not reproducible it's not ready for production. If any error happens, data drifts / skews, then there is no way to recreate the model. This is a check BEFORE going to production. Once it's in production, then yes C is important.
upvoted 1 times
...
fragkris
1 year, 5 months ago
Selected Answer: C
Monitoring is crucial. So - C
upvoted 2 times
...
M25
2 years ago
Selected Answer: C
Went with C
upvoted 1 times
...
e707
2 years ago
Selected Answer: C
I'll go with C. Monitoring model performance is an important aspect of production readiness. It allows the team to detect and respond to changes in performance that may affect the quality of the model. The other options are also important, but they are more focused on the development phase of the project rather than the production phase.
upvoted 1 times
...
John_Pongthorn
2 years, 2 months ago
Selected Answer: C
Hey! all guys A+B+D=The team has already tested features and data, model development, and infrastructure. we are about to go live with production. Monitoring readiness is the last thing to account for. It will be very rediculous if you launch model as production regardless of how we will have about monitoring. you will lauch model as production for while and will make plan to model performance monitoring later ??? you are too reckless. Pls . Read it carefully https://developers.google.com/machine-learning/testing-debugging/pipeline/production https://developers.google.com/machine-learning/testing-debugging/pipeline/overview#what-is-an-ml-pipeline. You Most guys prefer A : https://developers.google.com/machine-learning/testing-debugging/pipeline/deploying I think that it is all about model development prior to deploying .
upvoted 4 times
...
enghabeth
2 years, 3 months ago
Selected Answer: C
I think that your team ensure that all hypermarameters were turned yet when tested features... i think that it's more important that they ensure that model performance is monitored than thaining is reproducible for best practices. https://cloud.google.com/architecture/ml-on-gcp-best-practices
upvoted 1 times
...
John_Pongthorn
2 years, 3 months ago
Selected Answer: C
Reproducible Training is more likely to be in the Deployment step in that it referred to the question "The team has already tested features and data, model development" but the question focuses on Production readiness https://developers.google.com/machine-learning/testing-debugging/pipeline/production Monitor section is part of this above link
upvoted 1 times
...
ares81
2 years, 4 months ago
Selected Answer: C
C, for me.
upvoted 1 times
...
vakati
2 years, 6 months ago
Selected Answer: C
It's mentioned that the team has already tested features and data, implying that data generation is reproducible. If you have to test features data has to be reproducible to compare model outputs. ( https://developers.google.com/machine-learning/data-prep/construct/sampling-splitting/randomization). Hence C makes more sense
upvoted 2 times
...
bL357A
2 years, 8 months ago
Selected Answer: C
https://cloud.google.com/ai-platform/docs/ml-solutions-overview
upvoted 1 times
...
u_phoria
2 years, 10 months ago
Selected Answer: C
With the specific focus on "production readiness" as stated, I'd pick C above the others.
upvoted 2 times
...
KD1988
2 years, 10 months ago
I think it's C. A is related to infrastructure, B is related to model development and D is related to Data and features. It clearly mentioned that team has already tested for model development, data and features and infrastructure.
upvoted 1 times
...
Mohamed_Mossad
2 years, 11 months ago
Selected Answer: A
"production readiness" means that we are still in dev-test phase , and "performance monitoring" happens in production , and what if monitoring is applied but the model re-train is difficult , so "A" is the best answer
upvoted 1 times
...
abc0000
3 years, 2 months ago
A makes more sense than C.
upvoted 2 times
...

Topic 1 Question 56

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 56 discussion

You work for a credit card company and have been asked to create a custom fraud detection model based on historical data using AutoML Tables. You need to prioritize detection of fraudulent transactions while minimizing false positives. Which optimization objective should you use when training the model?

  • A. An optimization objective that minimizes Log loss
  • B. An optimization objective that maximizes the Precision at a Recall value of 0.50
  • C. An optimization objective that maximizes the area under the precision-recall curve (AUC PR) value
  • D. An optimization objective that maximizes the area under the receiver operating characteristic curve (AUC ROC) value
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Paul_Dirac
Highly Voted 4 years, 3 months ago
This is a case of imbalanced data. Ans: C https://stats.stackexchange.com/questions/262616/roc-vs-precision-recall-curves-on-imbalanced-dataset https://neptune.ai/blog/f1-score-accuracy-roc-auc-pr-auc
upvoted 23 times
GogoG
4 years ago
C is wrong - correct answer is D. ROC basically compares True Positives against False Negative, exactly what we are trying to optimise for.
upvoted 2 times
...
...
ralf_cc
Highly Voted 4 years, 4 months ago
D - https://en.wikipedia.org/wiki/Receiver_operating_characteristic
upvoted 8 times
omar_bh
4 years, 3 months ago
True. The true positive is presented by Y axis. The bigger the area the graph take, the higher TP ratio
upvoted 2 times
tavva_prudhvi
2 years, 3 months ago
A larger area under the ROC curve does indicate a better model performance in terms of correctly identifying true positives. However, it does not take into account the imbalance in the class distribution or the costs associated with false positives and false negatives. In contrast, the AUC PR curve focuses on the trade-off between precision (Y-axis) and recall (X-axis), making it more suitable for imbalanced datasets and applications with different costs for false positives and false negatives, like credit card fraud detection.
upvoted 2 times
...
...
tavva_prudhvi
2 years, 3 months ago
AUC ROC is more suitable when the class distribution is balanced and false positives and false negatives have similar costs. In the case of credit card fraud detection, the class distribution is typically imbalanced (fewer fraudulent transactions compared to non-fraudulent ones), and the cost of false positives (incorrectly identifying a transaction as fraudulent) and false negatives (failing to detect a fraudulent transaction) are not the same. By maximizing the AUC PR (area under the precision-recall curve), the model focuses on the trade-off between precision (proportion of true positives among predicted positives) and recall (proportion of true positives among actual positives), which is more relevant in imbalanced datasets and for applications where the costs of false positives and false negatives are not equal. This makes option C a better choice for credit card fraud detection.
upvoted 3 times
...
...
OpenKnowledge
Most Recent 1 month, 1 week ago
Selected Answer: C
AUC-ROC (Area Under the Receiver Operating Characteristic) and AUC-PR (Area Under the Precision-Recall Curve) are both metrics for evaluating binary classifiers, but AUC-ROC measures the trade-off between true positive rate and false positive rate, making it less sensitive to class imbalance and more appropriate for balanced class distribution, while AUC-PR focuses on the minority positive class by evaluating precision and recall, making it more appropriate for imbalanced datasets where correctly identifying positives is critical, such as in fraud detection or spam detection.
upvoted 1 times
...
jkkim_jt
1 year ago
Selected Answer: C
o AUC-PR focuses on how well the classifier performs for the positive class (precision and recall are both concerned with positives) --> more suitable when the focus is on indentifying the positive class in imbalanced data o AUC-ROC looks at the trade-off between the true positive rate (sensitivity) and the false positive rate, considering both classes. --> general purpose meric that works well when both classes are of similiar size ( ChatGPT )
upvoted 1 times
...
PhilipKoku
1 year, 5 months ago
Selected Answer: C
C) PR (Precision Recall)
upvoted 1 times
...
PhilipKoku
1 year, 5 months ago
Selected Answer: C
C) PR ROC
upvoted 1 times
...
tavva_prudhvi
2 years, 4 months ago
Selected Answer: C
In fraud detection, it's crucial to minimize false positives (transactions flagged as fraudulent but are actually legitimate) while still detecting as many fraudulent transactions as possible. AUC PR is a suitable optimization objective for this scenario because it provides a balanced trade-off between precision and recall, which are both important metrics in fraud detection. A high AUC PR value indicates that the model has high precision and recall, which means it can detect a large number of fraudulent transactions while minimizing false positives. Log loss (A) and AUC ROC (D) are also commonly used optimization objectives in machine learning, but they may not be as effective in this particular scenario. Precision at a Recall value of 0.50 (B) is a specific metric and not an optimization objective.
upvoted 4 times
...
M25
2 years, 6 months ago
Selected Answer: C
Went with C
upvoted 1 times
...
John_Pongthorn
2 years, 8 months ago
Selected Answer: C
Hi Everyone I discover, there are some clues that this question is likely to refer to the last section of https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc This is what it tries to tell us especially with the last sentence Classification-threshold invariance is not always desirable. In cases where there are wide disparities in the cost of false negatives vs. false positives, it may be critical to minimize one type of classification error. For example, when doing email spam detection, you likely want to prioritize minimizing false positives (even if that results in a significant increase of false negatives). AUC isn't a useful metric for this type of optimization. Additionally, it tells me which of the following choices is the answer to this question as below. https://cloud.google.com/automl-tables/docs/train#opt-obj.
upvoted 1 times
...
enghabeth
2 years, 9 months ago
Selected Answer: D
What is different however is that ROC AUC looks at a true positive rate TPR and false positive rate FPR while PR AUC looks at positive predictive value PPV and true positive rate TPR. Detect Fraudulent transactions = Max TP Minimizing false positives -> min FP https://neptune.ai/blog/f1-score-accuracy-roc-auc-pr-auc#:~:text=ROC%20AUC%20vs%20PR%20AUC&text=What%20is%20different%20however%20is,and%20true%20positive%20rate%20TPR
upvoted 1 times
...
John_Pongthorn
2 years, 9 months ago
Selected Answer: C
Detection of fraudulent transactions seems to be imbalanced data. https://cloud.google.com/automl-tables/docs/train#opt-obj AUC ROC : Distinguish between classes. Default value for binary classification. AUC PR Optimize results for predictions for the less common class. it is straightforward to answer, you just have to capture key word to get the right way. (Almost banlanced Or Imbalanced) https://machinelearningmastery.com/roc-curves-and-precision-recall-curves-for-classification-in-python/ When to Use ROC vs. Precision-Recall Curves? Generally, the use of ROC curves and precision-recall curves are as follows: ROC curves should be used when there are roughly equal numbers of observations for each class. Precision-Recall curves should be used when there is a moderate to large class imbalance.
upvoted 3 times
...
ares81
2 years, 10 months ago
Selected Answer: C
Fraud Detection --> Imbalanced Dataset ---> AUC PR --> C, for me
upvoted 1 times
...
wish0035
2 years, 11 months ago
Selected Answer: C
ans: C Paul_Dirac and giaZ are correct.
upvoted 1 times
...
hiromi
2 years, 11 months ago
Selected Answer: C
C https://towardsdatascience.com/on-roc-and-precision-recall-curves-c23e9b63820c
upvoted 2 times
...
itallix
3 years, 2 months ago
"You need to prioritize detection of fraudulent transactions while minimizing false positives." Seems that answer B fits this well. If we want to focus exactly on minimizing false positives we can do that by maximising Precision at a specific Recall value. C is about balance between these two, and D doesn't care about false positive/negatives.
upvoted 2 times
...
suresh_vn
3 years, 2 months ago
Selected Answer: D
D https://en.wikipedia.org/wiki/Receiver_operating_characteristic C optimize precision only
upvoted 1 times
suresh_vn
3 years, 2 months ago
Sorry, C is my final decision https://cloud.google.com/automl-tables/docs/train#opt-obj
upvoted 1 times
...
...
rtnk22
3 years, 3 months ago
Selected Answer: C
Answer is c.
upvoted 1 times
...

Topic 1 Question 57

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 57 discussion

Your company manages a video sharing website where users can watch and upload videos. You need to create an ML model to predict which newly uploaded videos will be the most popular so that those videos can be prioritized on your company's website. Which result should you use to determine whether the model is successful?

  • A. The model predicts videos as popular if the user who uploads them has over 10,000 likes.
  • B. The model predicts 97.5% of the most popular clickbait videos measured by number of clicks.
  • C. The model predicts 95% of the most popular videos measured by watch time within 30 days of being uploaded.
  • D. The Pearson correlation coefficient between the log-transformed number of views after 7 days and 30 days after publication is equal to 0.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Paul_Dirac
Highly Voted 3 years, 10 months ago
Ans: C (See https://developers.google.com/machine-learning/problem-framing/framing#quantify-it; though it's just an example.) (A) The absolute number of likes shouldn't be used because no information about subscribers or visits to the website is provided. The number may vary. (B) Clickbait videos are a subset of uploaded videos. Using them is an improper criterion. (D) The coefficient should reach 1. (Ref:https://arxiv.org/pdf/1510.06223.pdf)
upvoted 22 times
sensev
3 years, 9 months ago
Thanks for the detailed unswer and reference!
upvoted 5 times
...
...
moammary
Most Recent 9 months, 3 weeks ago
Selected Answer: A
The answer is A. Because the number of previous user likes is the only feature available on inference time (when the video has just been uploaded). Watch time and clicks are unavailable at inference time and should not be used for training!
upvoted 1 times
b7ad1d9
1 month, 3 weeks ago
The question is asking for the condition under which the model would be considered successful, i.e. the question is about model performance evaluation, not model prediction.
upvoted 1 times
...
...
PhilipKoku
11 months, 1 week ago
Selected Answer: C
C) Watch time
upvoted 1 times
...
M25
2 years ago
Selected Answer: C
Went with C
upvoted 1 times
...
wish0035
2 years, 4 months ago
ans: C In this type of questions, I think a good idea is trying to copy already existing solutions. For this case, YouTube cares a lot about watchtime. In a previous question, Amazon implemented "Usually buy together" for maximizing profit.
upvoted 4 times
...
hiromi
2 years, 4 months ago
Selected Answer: C
Must be C
upvoted 1 times
...
Mohamed_Mossad
2 years, 10 months ago
Selected Answer: C
watch time among all other options is the most KPI to rely on
upvoted 2 times
...
baimus
3 years, 1 month ago
I think this is B. The question specifies "popular" and also that "newly uploaded" videos need prioritising. C is therefore wrong because you don't have that metric until 30 days has passed from upload time. "Click through rate" is one measure of popularity, so it fits, and is instant.
upvoted 1 times
...
NamitSehgal
3 years, 4 months ago
C looks correct.
upvoted 1 times
...
celia20200410
3 years, 9 months ago
ANS: C D is wrong. Pearson's Correlation Coefficient is a linear correlation coefficient that returns a value of between -1 and +1. A -1 means there is a strong negative correlation +1 means that there is a strong positive correlation 0 means that there is no correlation
upvoted 3 times
...

Topic 1 Question 58

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 58 discussion

You are working on a Neural Network-based project. The dataset provided to you has columns with different ranges. While preparing the data for model training, you discover that gradient optimization is having difficulty moving weights to a good solution. What should you do?

  • A. Use feature construction to combine the strongest features.
  • B. Use the representation transformation (normalization) technique.
  • C. Improve the data cleaning step by removing features with missing values.
  • D. Change the partitioning step to reduce the dimension of the test set and have a larger training set.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
kurasaki
Highly Voted 4 years, 4 months ago
Vote for B. We could impute instead of remove the column to avoid loss of information
upvoted 27 times
...
pddddd
Highly Voted 4 years, 1 month ago
I also think it is B: "The presence of feature value X in the formula will affect the step size of the gradient descent. The difference in ranges of features will cause different step sizes for each feature. To ensure that the gradient descent moves smoothly towards the minima and that the steps for gradient descent are updated at the same rate for all the features, we scale the data before feeding it to the model."
upvoted 12 times
...
jsalvasoler
Most Recent 1 year, 3 months ago
Selected Answer: B
clearly B
upvoted 1 times
...
PhilipKoku
1 year, 5 months ago
Selected Answer: B
B) Option B (Use the representation transformation technique) is the most relevant choice. Normalizing the features will help gradient descent converge efficiently, leading to better weight updates and improved model performance. Remember that feature scaling is crucial for gradient optimization, especially when dealing with features that have different ranges. By ensuring consistent scales, you’ll enhance the effectiveness of your Neural Network training process.
upvoted 3 times
...
MultiCloudIronMan
1 year, 7 months ago
Selected Answer: B
Because the range needs to normalize
upvoted 2 times
...
fragkris
1 year, 11 months ago
Selected Answer: B
B - The key phrase is "different ranges", therefore we need to normalize the values.
upvoted 3 times
...
M25
2 years, 6 months ago
Selected Answer: B
Went with B
upvoted 1 times
...
SergioRubiano
2 years, 6 months ago
Selected Answer: B
Normalization
upvoted 1 times
...
ares81
2 years, 10 months ago
Selected Answer: B
Normalization is the word.
upvoted 2 times
...
ares81
2 years, 10 months ago
Selected Answer: C
Normalization is the word.
upvoted 1 times
...
hiromi
2 years, 11 months ago
Selected Answer: B
B "Normalization" is the keyword
upvoted 1 times
...
ggorzki
3 years, 9 months ago
Selected Answer: B
normalization https://developers.google.com/machine-learning/data-prep/transform/transform-numeric
upvoted 4 times
...
MK_Ahsan
3 years, 10 months ago
B. The problem does not mention anything about missing values. It needs to normalize the features with different ranges.
upvoted 4 times
...
NamitSehgal
3 years, 10 months ago
Looking at explanation I would choose C as well
upvoted 1 times
...
kaike_reis
3 years, 11 months ago
(B) - NN models needs features with close ranges - SGD converges well using features in [0, 1] scale - The question specifically mention "different ranges" Documentation - https://developers.google.com/machine-learning/data-prep/transform/transform-numeric
upvoted 3 times
...
Y2Data
4 years, 1 month ago
When gradient descent fails, it's out of the lacking of a powerful feature. Using normalization would make it worse. Instead, using either A or C would increase the strength of certain feature. But, C should come first since A is only feasible after at least 1 meaningful training. So C.
upvoted 2 times
...
ralf_cc
4 years, 4 months ago
B - remove the outliers?
upvoted 3 times
omar_bh
4 years, 3 months ago
Normalization is more complicated than that. Normalization changes the values of dataset's numeric fields to be in a common scale, without impacting differences in the ranges of values. Normalization is required only when features have different ranges.
upvoted 4 times
...
...

Topic 1 Question 59

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 59 discussion

Your data science team needs to rapidly experiment with various features, model architectures, and hyperparameters. They need to track the accuracy metrics for various experiments and use an API to query the metrics over time. What should they use to track and report their experiments while minimizing manual effort?

  • A. Use Kubeflow Pipelines to execute the experiments. Export the metrics file, and query the results using the Kubeflow Pipelines API.
  • B. Use AI Platform Training to execute the experiments. Write the accuracy metrics to BigQuery, and query the results using the BigQuery API.
  • C. Use AI Platform Training to execute the experiments. Write the accuracy metrics to Cloud Monitoring, and query the results using the Monitoring API.
  • D. Use AI Platform Notebooks to execute the experiments. Collect the results in a shared Google Sheets file, and query the results using the Google Sheets API.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Dunnoth
Highly Voted 2 years, 8 months ago
Selected Answer: A
Old answer is A. New answer (not available) would be Virtex AI experiments which comes with monitoring API inbuilt. https://cloud.google.com/blog/topics/developers-practitioners/track-compare-manage-experiments-vertex-ai-experiments
upvoted 18 times
...
Celia20210714
Highly Voted 4 years, 3 months ago
ANS: A https://codelabs.developers.google.com/codelabs/cloud-kubeflow-pipelines-gis Kubeflow Pipelines (KFP) helps solve these issues by providing a way to deploy robust, repeatable machine learning pipelines along with monitoring, auditing, version tracking, and reproducibility. Cloud AI Pipelines makes it easy to set up a KFP installation.
upvoted 12 times
...
b7ad1d9
Most Recent 1 month, 3 weeks ago
Selected Answer: B
Definitely an outdated question. Vertex AI pipelines abstracts kubeflow so you get the experimentation of Kubeflow. BQ api is easiest, most robust for analytics. However, I have a a pavlovian reflex to pick Kubeflow whenever I see the word "Experiments" :)
upvoted 1 times
...
Fer660
2 months, 2 weeks ago
Selected Answer: B
Chose B. A: could be, but as written below this is an old question and vertex AI experiments is the correct approach these days. Not C: monitoring is not the correct place to do analytics -- write the results to BQ. Not D: this is clearly a diversion.
upvoted 2 times
...
mouthwash
10 months, 2 weeks ago
Selected Answer: B
A is an old answer. The platform is evolving. So B is the right answer.
upvoted 2 times
...
rajshiv
11 months, 1 week ago
Selected Answer: B
A is not correct. Agreed that Kubeflow Pipelines is a powerful tool for running and managing ML workflows, but exporting metrics to an external file (e.g., CSV or JSON) requires extra manual work for managing the data and querying it. Kubeflow Pipelines do not have the same native integration with BigQuery for storing metrics, and querying via the Kubeflow API can be more complex than using BigQuery, especially when managing large-scale experiments. So I will go with B.
upvoted 3 times
...
eico
1 year, 2 months ago
Selected Answer: A
This is an old question, when Vertex AI didn't have Vertex AI Experiments. The old answer is A
upvoted 1 times
...
San1111111111
1 year, 3 months ago
Shoudlnt it be B? VAI has inbuilt VAI experiments and metadata to track metrics..
upvoted 1 times
...
dija123
1 year, 4 months ago
Selected Answer: A
Should agree with A
upvoted 1 times
...
PhilipKoku
1 year, 5 months ago
Selected Answer: A
A) Kubeflow pipelines
upvoted 1 times
...
Mickey321
1 year, 12 months ago
Selected Answer: C
either A or C but going with C due to minimal effort
upvoted 5 times
...
Liting
2 years, 4 months ago
Selected Answer: A
I agree with tavva_prudhvi that cloud monitoring is not the best option to do machine learning tracking, Metadata is a better option for that purpose
upvoted 1 times
...
tavva_prudhvi
2 years, 4 months ago
Selected Answer: A
Option C suggests using AI Platform Training to execute the experiments and write the accuracy metrics to Cloud Monitoring. While Cloud Monitoring can be used to monitor and collect metrics from various services in Google Cloud, it is not specifically designed for machine learning experiments tracking. Using Cloud Monitoring for tracking machine learning experiments may not provide the same level of functionality and flexibility as Kubeflow Pipelines or AI Platform Training. Additionally, querying the results from Cloud Monitoring may not be as straightforward as using the APIs provided by Kubeflow Pipelines or AI Platform Training. Therefore, while Cloud Monitoring can be used as a general-purpose monitoring solution, it may not be the best option for tracking and reporting machine learning experiments.
upvoted 2 times
...
PST21
2 years, 4 months ago
Cloud monitoring may not be the most suitable option for tracking and reporting experiments, only because of this option C is out & I stick to A
upvoted 1 times
...
M25
2 years, 6 months ago
Selected Answer: A
Went with A
upvoted 2 times
...
lucaluca1982
2 years, 6 months ago
Selected Answer: B
It is B
upvoted 1 times
...
John_Pongthorn
2 years, 8 months ago
This is the question, Try out and choose what is the closet to this lab.Last updated Jan 21, 2023 https://codelabs.developers.google.com/vertex_experiments_pipelines_intro#0
upvoted 1 times
John_Pongthorn
2 years, 8 months ago
As The lab walk me through how to create pipe line to experiment , it use Kubeflow and apply experiment SDK
upvoted 1 times
...
...

Topic 1 Question 60

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 60 discussion

You work for a bank and are building a random forest model for fraud detection. You have a dataset that includes transactions, of which 1% are identified as fraudulent. Which data transformation strategy would likely improve the performance of your classifier?

  • A. Write your data in TFRecords.
  • B. Z-normalize all the numeric features.
  • C. Oversample the fraudulent transaction 10 times.
  • D. Use one-hot encoding on all categorical features.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
ralf_cc
Highly Voted 4 years, 4 months ago
C - https://swarit.medium.com/detecting-fraudulent-consumer-transactions-through-machine-learning-25b1f2cabbb4
upvoted 14 times
...
NamitSehgal
Highly Voted 3 years, 10 months ago
Selected Answer: C
C is the answer
upvoted 5 times
...
dija123
Most Recent 1 year, 4 months ago
Selected Answer: C
Agree with C
upvoted 1 times
...
PhilipKoku
1 year, 5 months ago
Selected Answer: C
C) Oversample
upvoted 1 times
...
MultiCloudIronMan
1 year, 7 months ago
Selected Answer: C
Oversampling increases the number of fraudulent transaction in the training data to enable the machine to learn how to predict them
upvoted 3 times
...
fragkris
1 year, 11 months ago
Selected Answer: C
C - Even though most similar questions propose to downsample the majority (not fraudulent) and add weights to it.
upvoted 1 times
...
M25
2 years, 6 months ago
Selected Answer: C
Went with C
upvoted 2 times
...
wish0035
2 years, 11 months ago
Selected Answer: C
ans: C A, B, D => wouldnt help with imbalance
upvoted 1 times
...
hiromi
2 years, 11 months ago
Selected Answer: C
C https://medium.com/analytics-vidhya/credit-card-fraud-detection-how-to-handle-imbalanced-dataset-1f18b6f881
upvoted 1 times
...
Mohamed_Mossad
3 years, 4 months ago
Selected Answer: C
the best option is C
upvoted 1 times
...

Topic 1 Question 61

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 61 discussion

You are using transfer learning to train an image classifier based on a pre-trained EfficientNet model. Your training dataset has 20,000 images. You plan to retrain the model once per day. You need to minimize the cost of infrastructure. What platform components and configuration environment should you use?

  • A. A Deep Learning VM with 4 V100 GPUs and local storage.
  • B. A Deep Learning VM with 4 V100 GPUs and Cloud Storage.
  • C. A Google Kubernetes Engine cluster with a V100 GPU Node Pool and an NFS Server
  • D. An AI Platform Training job using a custom scale tier with 4 V100 GPUs and Cloud Storage
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
wish0035
Highly Voted 2 years, 11 months ago
Selected Answer: D
ans: D A, C => local storage, NFS... discarded. Google encourages you to use Cloud Storage. B => could do the job, but here I would focus on the "daily training" thing, because Vertex AI Training jobs are better for this. Also I think that Google usually encourages to use Vertex AI over VMs.
upvoted 15 times
...
OpenKnowledge
Most Recent 1 month, 1 week ago
Selected Answer: D
Using a custom scale tier in Google Cloud provides the flexibility and control to precisely tailor resources for specific workloads, which is not possible with the predefined configurations of the basic and premium tiers. The main benefits include greater cost optimization, fine-tuned performance, and the ability to accommodate unique or specialized requirements.
upvoted 1 times
...
gvk1
6 months, 3 weeks ago
Selected Answer: D
Generally deep learning models need more 1 day to train, so option D stands out.
upvoted 1 times
...
thescientist
10 months, 2 weeks ago
Selected Answer: D
D because you only pay for infra when you use AI Platform(Vertex) and for VM's you pay them as long as they are open. No auto shut down
upvoted 1 times
...
oddsoul
1 year, 1 month ago
Selected Answer: D
Answer: D auto scaling
upvoted 1 times
...
San1111111111
1 year, 3 months ago
D because automatic scaling
upvoted 1 times
...
PhilipKoku
1 year, 5 months ago
Selected Answer: D
D) Is the best answer
upvoted 1 times
...
abhay669
1 year, 11 months ago
Selected Answer: D
I'll go with D. How is C correct?
upvoted 1 times
...
Mickey321
1 year, 12 months ago
Selected Answer: A
D as need to minimize cost
upvoted 1 times
...
Mdso
2 years, 3 months ago
Selected Answer: A
I think it is A. Refer to Q20 of the GCP Sample Questions - they say managed services (such as Kubeflow Pipelines / Vertex AI) are not the options for 'minimizing costs'. In this case, you should configure your own infrastructure to train the model leaving A,B. Undecided between A,B because A would minimize costs, but also result in inefficient I/O operations during training.
upvoted 2 times
...
tavva_prudhvi
2 years, 4 months ago
Selected Answer: D
The pre-trained EfficientNet model can be easily loaded from Cloud Storage, which eliminates the need for local storage or an NFS server. Using AI Platform Training allows for the automatic scaling of resources based on the size of the dataset, which can save costs compared to using a fixed-size VM or node pool. Additionally, the ability to use custom scale tiers allows for fine-tuning of resource allocation to match the specific needs of the training job.
upvoted 2 times
...
M25
2 years, 6 months ago
Selected Answer: D
Went with D
upvoted 1 times
...
shankalman717
2 years, 8 months ago
Selected Answer: B
B. A Deep Learning VM with 4 V100 GPUs and Cloud Storage. For this scenario, a Deep Learning VM with 4 V100 GPUs and Cloud Storage is likely the most cost-effective solution while still providing sufficient computing resources for the model training. Using Cloud Storage can allow the model to be trained and the data to be stored in a scalable and cost-effective way. Option A, using a Deep Learning VM with local storage, may not provide enough storage capacity to store the training data and model checkpoints. Option C, using a Kubernetes Engine cluster, can be overkill for the size of the job and adds additional complexity. Option D, using an AI Platform Training job, is a good option as it is designed for running machine learning jobs at scale, but may be more expensive than a Deep Learning VM with Cloud Storage.
upvoted 3 times
...
enghabeth
2 years, 9 months ago
Selected Answer: D
becouse it's cheap
upvoted 1 times
...
hiromi
2 years, 11 months ago
Selected Answer: D
it seems D
upvoted 3 times
...
OzoneReloaded
2 years, 11 months ago
Selected Answer: D
I think it's D
upvoted 2 times
...
JeanEl
2 years, 11 months ago
Selected Answer: B
It's D
upvoted 2 times
...

Topic 1 Question 62

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 62 discussion

While conducting an exploratory analysis of a dataset, you discover that categorical feature A has substantial predictive power, but it is sometimes missing. What should you do?

  • A. Drop feature A if more than 15% of values are missing. Otherwise, use feature A as-is.
  • B. Compute the mode of feature A and then use it to replace the missing values in feature A.
  • C. Replace the missing values with the values of the feature with the highest Pearson correlation with feature A.
  • D. Add an additional class to categorical feature A for missing values. Create a new binary feature that indicates whether feature A is missing.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
wish0035
Highly Voted 2 years, 4 months ago
Selected Answer: D
ans: D A => no, you don't want to drop a feature with high prediction power. B => i think this could confuse the model... a better solution could be to fill missing values using an algorithm like Expectation Maximization, but using the mode i think is a bad idea in this case, because if you have a significant number of missing values (for example >10%) this would modify the "predictive power". you don't want to lose predictive power of a feature, just guide the model to learn when to use that feature and when to ignore it. C => this doesn't make any sense for me. not sure what i would do that. D => i think this could be a really good approach, and i'm pretty sure it would work pretty well a lot of models. the model would learn that when "is_available_feat_A" == True, then it would use the feature A, but whenever it is missing then it would try to use other features.
upvoted 16 times
frangm23
2 years, 1 month ago
I guess I would go with D, but it confuses me the fact that in option D, it doesn't say that NaN values are replaced (only that there's a new column added) and this could lead to problems like exploding gradients. Plus, Google encourages to replace missing values. https://developers.google.com/machine-learning/testing-debugging/common/data-errors Any thoughts on this?
upvoted 2 times
...
...
hiromi
Highly Voted 2 years, 4 months ago
Selected Answer: B
B "For categorical variables, we can usually replace missing values with mean, median, or most frequent values" Dr. Logan Song - Journey to Become a Google Cloud Machine Learning Engineer - Page 48
upvoted 5 times
tavva_prudhvi
1 year, 6 months ago
While this approach may seem reasonable, it can introduce bias in the dataset by over-representing the mode, especially if the missing values are not missing at random.
upvoted 1 times
...
...
b7ad1d9
Most Recent 1 month, 3 weeks ago
Selected Answer: D
Voted D because that seems to cause less harm than the other good option B (replace with mode). IRL, you would be guided by the domain and the use case. Too little info here!
upvoted 1 times
...
Fer660
2 months, 2 weeks ago
Selected Answer: B
Not A: destroys value. B is correct Not C: obvious nonsense Not D: the 'tell' is that this approach does the same thing twice. If we already added a new category for the missing values, why add a binary feature to show the same thing ? Makes no sense.
upvoted 1 times
...
PhilipKoku
11 months, 1 week ago
Selected Answer: D
D) Good approach
upvoted 1 times
...
MultiCloudIronMan
1 year, 1 month ago
Selected Answer: B
Google encourages filling missing value and using mode is one of the examples given. D only tell the obvious - data is missing!
upvoted 2 times
...
fragkris
1 year, 5 months ago
Selected Answer: D
B and D are correct, but I decided to go with D.
upvoted 1 times
...
Mickey321
1 year, 5 months ago
Selected Answer: D
highly predictive
upvoted 1 times
...
ichbinnoah
1 year, 6 months ago
Selected Answer: B
Definitely not D, it does not even solve the problem of NA values.
upvoted 2 times
...
andresvelasco
1 year, 7 months ago
Options B or D But isnt there an inconsistency in option D? if you replace missing values with a new category ("missing") why would you haveto create an extra feature?
upvoted 1 times
...
Liting
1 year, 10 months ago
Selected Answer: D
Agree with wish0035, answer should be D
upvoted 1 times
...
PST21
1 year, 10 months ago
By creating a new class for the missing values, you explicitly capture the absence of data, which can provide valuable information for predictive modeling. Additionally, creating a binary feature allows the model to distinguish between cases where feature A is present and cases where it is missing, which can be useful for identifying potential patterns or relationships in the data.
upvoted 2 times
...
amtg
1 year, 11 months ago
Selected Answer: B
By imputing the missing values with the mode (the most frequent value), you retain the original feature's predictive power while handling the missing values
upvoted 1 times
...
Scipione_
1 year, 11 months ago
Selected Answer: D
Both B and D are possible, but the correct answer is D because of the feature high predictive power.
upvoted 2 times
...
M25
2 years ago
Selected Answer: D
Went with D
upvoted 1 times
...
tavva_prudhvi
2 years, 1 month ago
I think, its D. Option B of imputing the missing values of feature A with the mode of feature A could be a reasonable approach if the mode provides a good representation of the distribution of feature A. However, this method may lead to biased results if the mode is not representative of the missing values. This could be the case if the missing values have a different distribution than the observed values. Similarly, When a categorical feature has substantial predictive power, it is important not to discard it. Instead, missing values can be handled by adding an additional class for missing values and creating a new binary feature that indicates whether feature A is missing or not. This approach ensures that the predictive power of feature A is retained while accounting for missing values. Computing the mode of feature A and replacing missing values may distort the distribution of the feature and create bias in the analysis. Similarly, replacing missing values with values from another feature may introduce noise and lead to incorrect results.
upvoted 2 times
...
BenMS
2 years, 2 months ago
Selected Answer: D
If our objective was to produce a complete dataset then we might use some average value to fill in the gaps (option B) but in this case we want to predict an outcome, so inventing our own data is not going to help in my view. Option D is the most sensible approach to let the model choose the best features.
upvoted 1 times
...

Topic 1 Question 63

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 63 discussion

You work for a large retailer and have been asked to segment your customers by their purchasing habits. The purchase history of all customers has been uploaded to BigQuery. You suspect that there may be several distinct customer segments, however you are unsure of how many, and you don’t yet understand the commonalities in their behavior. You want to find the most efficient solution. What should you do?

  • A. Create a k-means clustering model using BigQuery ML. Allow BigQuery to automatically optimize the number of clusters.
  • B. Create a new dataset in Dataprep that references your BigQuery table. Use Dataprep to identify similarities within each column.
  • C. Use the Data Labeling Service to label each customer record in BigQuery. Train a model on your labeled data using AutoML Tables. Review the evaluation metrics to understand whether there is an underlying pattern in the data.
  • D. Get a list of the customer segments from your company’s Marketing team. Use the Data Labeling Service to label each customer record in BigQuery according to the list. Analyze the distribution of labels in your dataset using Data Studio.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
PhilipKoku
11 months, 1 week ago
Selected Answer: A
A) K-means is ideal for unsupervised clustering
upvoted 2 times
...
MultiCloudIronMan
1 year, 1 month ago
Selected Answer: A
K-means algorithm is used for grouping/clustering data in unsupervised learning experiments.
upvoted 3 times
...
M25
2 years ago
Selected Answer: A
Went with A
upvoted 4 times
...
CloudKida
2 years ago
Selected Answer: A
when to use k-means : Your data may contain natural groupings or clusters of data. You may want to identify these groupings descriptively in order to make data-driven decisions. For example, a retailer may want to identify natural groupings of customers who have similar purchasing habits or locations. This process is known as customer segmentation. https://cloud.google.com/bigquery/docs/kmeans-tutorial
upvoted 4 times
...
tavva_prudhvi
2 years, 1 month ago
A This is the most efficient solution for segmenting customers based on their purchasing habits, as it utilizes BigQuery's built-in machine learning capabilities to identify distinct clusters of customers based on their purchasing behavior. By allowing BigQuery to automatically optimize the number of clusters, you can ensure that the model identifies the most appropriate number of segments based on the data, without having to manually select the number of clusters.
upvoted 2 times
...
ares81
2 years, 4 months ago
Selected Answer: A
I correct myself. It's A: According to the documentation, if you omit the num_clusters option, BigQuery ML will choose a reasonable default based on the total number of rows in the training data.
upvoted 2 times
...
hiromi
2 years, 4 months ago
Selected Answer: A
A https://cloud.google.com/bigquery-ml/docs/kmeans-tutorial https://towardsdatascience.com/how-to-use-k-means-clustering-in-bigquery-ml-to-understand-and-describe-your-data-better-c972c6f5733b
upvoted 3 times
...
wish0035
2 years, 4 months ago
Selected Answer: A
ans: A, pretty sure. C, D => discarded, very time consuming. B => yes, you can identify similarities within each column, but when i read "you don’t yet understand the commonalities in their behavior" i understand that this job would be difficult, because there could be many columns to analyze, and i don't think that this would be efficient. A => BigQuery ML is compatible with kmeans clustering, it's easy and efficient to create, and i would automatically detect the number of clusters. Also from the BigQuery ML docs: "K-means clustering for data segmentation; for example, identifying customer segments." (Source: https://cloud.google.com/bigquery-ml/docs/introduction#supported_models_in)
upvoted 4 times
...
LearnSodas
2 years, 4 months ago
Selected Answer: A
K-means is a good unsupervised learning algorithm to segment a population based on similarity We can usa K-means directly in BQ, so I think it's "the most efficient way" Labeling is not a good option since we don't really know what make a customer similar to another, and why dataprep if we can use directly BQ?
upvoted 4 times
...
ares81
2 years, 5 months ago
It seems B, to me.
upvoted 1 times
...
neochaotic
2 years, 5 months ago
Selected Answer: B
Its B! Dataprep provides Data profiling functionalities
upvoted 1 times
...
japoji
2 years, 5 months ago
The question is about commonalities of clients by characteristics, no about characteristics by client. I mean with B you are looking for segments of the characteristics which define a client. But you need segments of clients defined by characteristics.
upvoted 1 times
...
Vedjha
2 years, 5 months ago
Will go for 'A' as it is easy to build model in BQML where data is already present and optimization would be auto in case of K-mean algo
upvoted 4 times
...

Topic 1 Question 64

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 64 discussion

You recently designed and built a custom neural network that uses critical dependencies specific to your organization’s framework. You need to train the model using a managed training service on Google Cloud. However, the ML framework and related dependencies are not supported by AI Platform Training. Also, both your model and your data are too large to fit in memory on a single machine. Your ML framework of choice uses the scheduler, workers, and servers distribution structure. What should you do?

  • A. Use a built-in model available on AI Platform Training.
  • B. Build your custom container to run jobs on AI Platform Training.
  • C. Build your custom containers to run distributed training jobs on AI Platform Training.
  • D. Reconfigure your code to a ML framework with dependencies that are supported by AI Platform Training.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
mil_spyro
Highly Voted 2 years, 4 months ago
Selected Answer: C
Answer C. By running your machine learning (ML) training job in a custom container, you can use ML frameworks, non-ML dependencies, libraries, and binaries that are not otherwise supported on Vertex AI. Model and your data are too large to fit in memory on a single machine hence distributed training jobs. https://cloud.google.com/vertex-ai/docs/training/containers-overview
upvoted 12 times
...
PhilipKoku
Most Recent 11 months, 1 week ago
Selected Answer: C
C) Distributed training with customer containers
upvoted 1 times
...
MultiCloudIronMan
1 year, 1 month ago
Selected Answer: C
This allows using external dependences and distributed training will solve the memory issues
upvoted 3 times
...
Werner123
1 year, 2 months ago
Selected Answer: C
Critical dependencies that are not supported -> Custom container Too large to fit in memory on a single machine -> Distributed
upvoted 3 times
...
M25
2 years ago
Selected Answer: C
Went with C
upvoted 1 times
...
wish0035
2 years, 4 months ago
Selected Answer: C
ans: C A, D => too much work. B => discarded because "model and your data are too large to fit in memory on a single machine"
upvoted 1 times
...
ares81
2 years, 5 months ago
C, for me!
upvoted 1 times
...
JeanEl
2 years, 5 months ago
Selected Answer: C
I think it's C
upvoted 1 times
...
Vedjha
2 years, 5 months ago
Will go for 'C'- Custom containers can address the env limitation and distributed processing will handle the data volume
upvoted 1 times
...

Topic 1 Question 65

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 65 discussion

While monitoring your model training’s GPU utilization, you discover that you have a native synchronous implementation. The training data is split into multiple files. You want to reduce the execution time of your input pipeline. What should you do?

  • A. Increase the CPU load
  • B. Add caching to the pipeline
  • C. Increase the network bandwidth
  • D. Add parallel interleave to the pipeline
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
hiromi
Highly Voted 2 years, 4 months ago
Selected Answer: D
It's D https://www.tensorflow.org/guide/data_performance
upvoted 8 times
...
PhilipKoku
Most Recent 11 months, 1 week ago
Selected Answer: D
D) Parallelisation required
upvoted 2 times
...
MultiCloudIronMan
1 year, 1 month ago
Selected Answer: D
Multiple files reduce execution time through papalism
upvoted 2 times
...
Werner123
1 year, 2 months ago
Selected Answer: D
"training data split into multiple files", "reduce the execution time of your input pipeline" -> Parallel interleave
upvoted 3 times
...
M25
2 years ago
Selected Answer: D
Went with D
upvoted 1 times
...
OzoneReloaded
2 years, 5 months ago
Selected Answer: D
I think it's D
upvoted 2 times
...
Vedjha
2 years, 5 months ago
D for me
upvoted 2 times
...

Topic 1 Question 66

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 66 discussion

Your data science team is training a PyTorch model for image classification based on a pre-trained RestNet model. You need to perform hyperparameter tuning to optimize for several parameters. What should you do?

  • A. Convert the model to a Keras model, and run a Keras Tuner job.
  • B. Run a hyperparameter tuning job on AI Platform using custom containers.
  • C. Create a Kuberflow Pipelines instance, and run a hyperparameter tuning job on Katib.
  • D. Convert the model to a TensorFlow model, and run a hyperparameter tuning job on AI Platform.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
OzoneReloaded
Highly Voted 2 years, 5 months ago
Selected Answer: B
B because Vertex AI supports custom models hyperparameter tuning
upvoted 11 times
...
John_Pongthorn
Highly Voted 2 years, 3 months ago
Selected Answer: B
C: Don't wast your time to convert to other framework, you can use it on custom container absolutely. https://cloud.google.com/blog/topics/developers-practitioners/pytorch-google-cloud-how-train-and-tune-pytorch-models-vertex-ai
upvoted 5 times
John_Pongthorn
2 years, 3 months ago
I insist on B, At the present, it seem like we can use prebuilt container instead of custom container, but none of the 4 choice, so B is the most likely way out of this question.
upvoted 3 times
...
...
PhilipKoku
Most Recent 11 months, 1 week ago
Selected Answer: B
B) Customer containers
upvoted 1 times
...
M25
2 years ago
Selected Answer: B
Went with B
upvoted 1 times
...
John_Pongthorn
2 years, 2 months ago
Selected Answer: B
This is a question sourced from google blog pre-trained BERT model https://cloud.google.com/blog/topics/developers-practitioners/pytorch-google-cloud-how-train-and-tune-pytorch-models-vertex-ai https://cloud.google.com/blog/topics/developers-practitioners/pytorch-google-cloud-how-deploy-pytorch-models-vertex-ai
upvoted 1 times
...
wish0035
2 years, 4 months ago
Selected Answer: B
ans: B A, D => too much work. C => not sure why you would complicate so much when Vertex AI has this feature in custom containers.
upvoted 5 times
...
Vedjha
2 years, 5 months ago
C seems to correct- https://www.kubeflow.org/docs/components/katib/overview/
upvoted 1 times
LearnSodas
2 years, 5 months ago
Why use a thrid-party tool when Vertex AI already let you tuning hyperparameters in custom containers? I think it's B
upvoted 5 times
...
...

Topic 1 Question 67

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 67 discussion

You have a large corpus of written support cases that can be classified into 3 separate categories: Technical Support, Billing Support, or Other Issues. You need to quickly build, test, and deploy a service that will automatically classify future written requests into one of the categories. How should you configure the pipeline?

  • A. Use the Cloud Natural Language API to obtain metadata to classify the incoming cases.
  • B. Use AutoML Natural Language to build and test a classifier. Deploy the model as a REST API.
  • C. Use BigQuery ML to build and test a logistic regression model to classify incoming requests. Use BigQuery ML to perform inference.
  • D. Create a TensorFlow model using Google’s BERT pre-trained model. Build and test a classifier, and deploy the model using Vertex AI.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
wish0035
Highly Voted 2 years, 4 months ago
Selected Answer: B
ans: B A => no, you need customization. C, B => more work and complexity B => AutoML is easier and faster and "you need to quickly build, test, and deploy". Also the REST API part fits our use case.
upvoted 10 times
...
PhilipKoku
Most Recent 11 months, 1 week ago
Selected Answer: B
B) AutoML NLP
upvoted 2 times
...
gscharly
1 year ago
Went with B
upvoted 1 times
...
MultiCloudIronMan
1 year, 1 month ago
Selected Answer: B
AutoML is faster and offers the requisite REST API
upvoted 1 times
...
Werner123
1 year, 2 months ago
Selected Answer: B
"quickly build, test and deploy" + custom categories -> AutoML
upvoted 1 times
...
M25
2 years ago
Selected Answer: B
Went with B
upvoted 1 times
...
frangm23
2 years ago
Selected Answer: B
I think it's B, but I don't understand why it doesn't suggest to deploy the model on Vertex AI instead of as a REST API.
upvoted 1 times
...
enghabeth
2 years, 3 months ago
Selected Answer: B
ans B becouse es more fast
upvoted 1 times
...
hiromi
2 years, 4 months ago
Selected Answer: B
B wish0035 explained
upvoted 3 times
...
ares81
2 years, 5 months ago
Quickly: AutoML: B.
upvoted 1 times
...
OzoneReloaded
2 years, 5 months ago
Selected Answer: B
I think it's B because of the deployment
upvoted 1 times
...
Vedjha
2 years, 5 months ago
B will give quick result on classification
upvoted 1 times
...

Topic 1 Question 68

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 68 discussion

You need to quickly build and train a model to predict the sentiment of customer reviews with custom categories without writing code. You do not have enough data to train a model from scratch. The resulting model should have high predictive performance. Which service should you use?

  • A. AutoML Natural Language
  • B. Cloud Natural Language API
  • C. AI Hub pre-made Jupyter Notebooks
  • D. AI Platform Training built-in algorithms
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
OpenKnowledge
2 months ago
Selected Answer: A
AutoML frequently uses transfer learning, especially for deep learning tasks like image recognition. It is one of the core techniques that enables AutoML to train high-quality models quickly and with smaller datasets. Instead of building a deep learning model from scratch, AutoML starts with a pre-trained model that has already learned to recognize features from a massive dataset, like ImageNet for images.
upvoted 1 times
...
moammary
9 months, 3 weeks ago
Selected Answer: B
You do not have enough data to train a model from scratch. AutoML needs training data.
upvoted 3 times
...
AB_C
11 months, 2 weeks ago
Selected Answer: A
custom modeling needed
upvoted 2 times
...
PhilipKoku
1 year, 5 months ago
Selected Answer: A
A) AutoML - Codeless
upvoted 2 times
...
nmnm22
1 year, 5 months ago
"Quickly build" >> usually go with the low-code/no-code options of autoML
upvoted 1 times
...
b2aaace
1 year, 6 months ago
Selected Answer: B
AutoML does not have transfer learning capabilities as of now. Given that there are not enough data to train from scratch, B is the only option that makes sense.
upvoted 2 times
pinimichele01
1 year, 6 months ago
https://cloud.google.com/vertex-ai/docs/text-data/sentiment-analysis/prepare-data
upvoted 1 times
...
...
MultiCloudIronMan
1 year, 7 months ago
Selected Answer: A
This suitable job for AutoML, it used transfer learning when there is small data for training.
upvoted 1 times
MultiCloudIronMan
1 year, 2 months ago
AutoML now supports Transfer learning, I checked it.
upvoted 1 times
...
...
LFavero
1 year, 8 months ago
Selected Answer: A
AutoML Natural Language is designed to work well even with relatively small datasets. It uses transfer learning and other techniques to train models effectively on limited data, which is crucial since there's enough data to train a model from scratch.
upvoted 3 times
...
Krish6488
1 year, 12 months ago
Selected Answer: A
Custom models and custom categories and hence AutoML natural language, It would still work with less data
upvoted 1 times
...
Sahana_98
2 years ago
Selected Answer: B
NO DATA TO TRAIN THE MODEL FROM SCRACH
upvoted 2 times
GuineaPigHunter
1 year, 5 months ago
"You do not have enough data to train a model from scratch" - I think this means that there is SOME data but not a lot, something which AutoML can handle.
upvoted 2 times
...
...
M25
2 years, 6 months ago
Selected Answer: A
Went with A
upvoted 1 times
...
dfdrin
2 years, 7 months ago
Selected Answer: A
It's A. "Custom categories" means B can't be correct
upvoted 3 times
...
tavva_prudhvi
2 years, 8 months ago
Its A, Check this document, https://cloud.google.com/natural-language/automl/docs/beginners-guide The Natural Language API discovers syntax, entities, and sentiment in text, and classifies text into a predefined set of categories.
upvoted 3 times
...
shankalman717
2 years, 8 months ago
Selected Answer: B
If you do not have enough data to train a model from scratch, then it may be more appropriate to use a pre-trained model or a pre-made Jupyter Notebook. Option B, the Cloud Natural Language API, could still be a viable option if you have access to labeled data for sentiment analysis. The API provides pre-trained models for sentiment analysis that you can use to classify text. However, if you have custom categories or labels, then you would need to train a custom model, which may not be feasible with limited data.
upvoted 4 times
...
enghabeth
2 years, 9 months ago
Selected Answer: A
https://www.toptal.com/machine-learning/google-nlp-tutorial#:~:text=Google%20Natural%20Language%20API%20vs.&text=Google%20AutoML%20Natural%20Language%20is,t%20require%20machine%20learning%20knowledge. In this case need custom categories without writing code
upvoted 2 times
...
John_Pongthorn
2 years, 9 months ago
Selected Answer: A
Quickly ==> A and B and custom categories + you do not have enough data to train a model (it doesn't mean no data to train) it will probably have a few samples Let's say 10 samples) as this link https://cloud.google.com/natural-language/automl/docs/beginners-guide#include-enough-labeled-examples-in-each-category ==> A
upvoted 2 times
...
John_Pongthorn
2 years, 9 months ago
Selected Answer: B
Quickly ==> A and B and custom categories + you do not have enough data to train a model (it doesn't mean no data to train) it will probably have a few samples Let's say 10 samples) ==> B
upvoted 1 times
John_Pongthorn
2 years, 9 months ago
Sorry, I go with A A A A A A
upvoted 3 times
...
John_Pongthorn
2 years, 9 months ago
https://cloud.google.com/natural-language/automl/docs/beginners-guide#include-enough-labeled-examples-in-each-category
upvoted 2 times
...
...

Topic 1 Question 69

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 69 discussion

You need to build an ML model for a social media application to predict whether a user’s submitted profile photo meets the requirements. The application will inform the user if the picture meets the requirements. How should you build a model to ensure that the application does not falsely accept a non-compliant picture?

  • A. Use AutoML to optimize the model’s recall in order to minimize false negatives.
  • B. Use AutoML to optimize the model’s F1 score in order to balance the accuracy of false positives and false negatives.
  • C. Use Vertex AI Workbench user-managed notebooks to build a custom model that has three times as many examples of pictures that meet the profile photo requirements.
  • D. Use Vertex AI Workbench user-managed notebooks to build a custom model that has three times as many examples of pictures that do not meet the profile photo requirements.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
LearnSodas
Highly Voted 2 years, 11 months ago
I think it's B, since we want to reduce false positives
upvoted 21 times
jamesking1103
2 years, 10 months ago
B yes, A is incorrect as minimize false negatives does not help
upvoted 3 times
julesnoa
1 year, 1 month ago
False negative: Non-compliant, but did not alert. That is what we want to minimize.
upvoted 2 times
julesnoa
1 year, 1 month ago
Upon reading further it seems like the model predicts compliance, so a positive means the picture is compliant. Then B seems more appropriate
upvoted 1 times
...
...
NickHapton
2 years, 4 months ago
a non-compliant profile image = positive false negatives = didn't alert the non-compliant profile image so the objective is to minimize false nagatives
upvoted 10 times
...
...
...
[Removed]
Highly Voted 2 years, 3 months ago
Selected Answer: A
The answer is A. The negative event is usually labeled as positive (e.g., fraud detection, customer default prediction, and here non-compliant picture identification). The question explicitly says, "ensure that the application does not falsely accept a non-compliant picture." So we should avoid falsely labeling a non-compliant image as compliant (negative). It is never mentioned in the question that false positives are also a concern. So, recall is better than F1-score for this problem.
upvoted 16 times
baimus
1 year, 2 months ago
The question explicitly states that this isn't the case, it's identifying compliant images, it is compliance that is the positive, so F1 is the only sensible metric.
upvoted 1 times
...
...
OpenKnowledge
Most Recent 1 month, 1 week ago
Selected Answer: D
A is, definitely, not the answer, since we need to minimize false positive. B is used for balanced dataset which is not the case for this problem. C doesn't make sense for this problem D is the answer since we need oversampling of minority class (in this case, Picture non-compliant) to train the model for better prediction of minority class
upvoted 1 times
...
b7ad1d9
1 month, 3 weeks ago
Selected Answer: A
Minimizing false negatives is the goal as you dont want non-compliant pics to "senak" in, i.e. positives posing as negatives must be effectively found. AutoML is the easiest way to do so IF THE TRAINING DATASET is fairly balanced between compliant and non-compliant photos. If we assume the dataset is unbalanced with mostly compliant, then the model will not have great recall.With this assumption, D would be a better answer. However, this assumption is not given. So the simplest answer is A
upvoted 1 times
...
24bfb02
5 months ago
Selected Answer: B
Minimize false positives, that is the objective
upvoted 1 times
...
niamnesh
5 months, 3 weeks ago
Selected Answer: D
By oversampling non-compliant photos, you teach the model to better distinguish and not mistakenly accept non-compliant photos, thereby reducing false positives.
upvoted 3 times
...
bc3f222
8 months, 3 weeks ago
Selected Answer: B
ideally it should minimize FP but that's not an option, option A is incorrect as minimizing FN will increase FP. So next best option is target the f1 score as that is harmonic mean of precision and recall so that will have the most impact in getting towards the ask
upvoted 1 times
...
8619d79
9 months ago
Selected Answer: A
Compliant=negative, accepted non-compliant=false negative (I thought it is negative, to be accepted, but is not). So I need to minimize false negative, Recall
upvoted 2 times
...
moammary
9 months, 3 weeks ago
Selected Answer: A
Answer is A --> non-compliant photo is positive. Falsely accepting a non-compliant photo is a false negative.
upvoted 2 times
...
vinevixx
10 months ago
Selected Answer: B
The goal is the compliance of an image: false positives means an image accepted but not-compliant and viceversa for false negatives
upvoted 1 times
...
nimbous
10 months, 2 weeks ago
Selected Answer: D
oversampling the negative class to avoid falsely labelling them as compliant
upvoted 2 times
...
Ankit267
10 months, 2 weeks ago
Selected Answer: B
Choice between A & B. A if +ve class is non-compliant pics, B if +ve class is compliant pics, As per query, +ve class is compliant pics - "to predict whether a user’s submitted profile photo meets the requirements". Though I feel the person who framed the question really wanted A to be the choice, seems like question is wrongly framed - Selected B
upvoted 1 times
...
thescientist
10 months, 2 weeks ago
Selected Answer: D
D: In this case, a false positive means accepting a non-compliant picture. You want to minimize these. By providing more examples of non-compliant pictures, you train the model to be more sensitive to identifying them and less likely to make this type of error.
upvoted 3 times
...
soumik_barori
10 months, 3 weeks ago
Selected Answer: D
1. Emphasizes the minority class (non-compliant pictures), ensuring the model better differentiates non-compliant images. 2. Balances the dataset to prevent the model from favouring compliant images disproportionately. 3. Provides flexibility to fine-tune the model for this specific use case.
upvoted 3 times
...
uatud3
11 months, 2 weeks ago
Selected Answer: B
It's B. You are optimizing for false positives, Not false negatives(Recall)
upvoted 1 times
...
AB_C
11 months, 2 weeks ago
Selected Answer: D
D - minimizing false positives
upvoted 3 times
...
desertlotus1211
1 year ago
Answer is D: Since the goal is to minimize false positives (incorrectly accepting a non-compliant photo), having more examples of non-compliant photos in the training data will help the model better identify these cases. By training with more non-compliant examples, the model will learn to recognize these images more accurately, thus reducing the chance of falsely accepting them.
upvoted 3 times
...

Topic 1 Question 70

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 70 discussion

You lead a data science team at a large international corporation. Most of the models your team trains are large-scale models using high-level TensorFlow APIs on AI Platform with GPUs. Your team usually takes a few weeks or months to iterate on a new version of a model. You were recently asked to review your team’s spending. How should you reduce your Google Cloud compute costs without impacting the model’s performance?

  • A. Use AI Platform to run distributed training jobs with checkpoints.
  • B. Use AI Platform to run distributed training jobs without checkpoints.
  • C. Migrate to training with Kuberflow on Google Kubernetes Engine, and use preemptible VMs with checkpoints.
  • D. Migrate to training with Kuberflow on Google Kubernetes Engine, and use preemptible VMs without checkpoints.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
seifou
Highly Voted 2 years, 11 months ago
Selected Answer: C
https://cloud.google.com/blog/products/ai-machine-learning/reduce-the-costs-of-ml-workflows-with-preemptible-vms-and-gpus?hl=en
upvoted 11 times
...
sashimii14
Most Recent 1 year ago
Selected Answer: C
C for me
upvoted 1 times
...
PhilipKoku
1 year, 5 months ago
Selected Answer: C
C) Preemptible VMs with Check points
upvoted 1 times
...
MultiCloudIronMan
1 year, 7 months ago
Selected Answer: C
Pre-emptive VMs are cheaper and checkpoints will enable termination if the result is acceptable
upvoted 4 times
...
libo1985
2 years, 1 month ago
I guess distributed training is not cheap. So C.
upvoted 1 times
...
joaquinmenendez
2 years, 1 month ago
C is the best approach because it allows you to reduce your compute costs without impacting the model's performance. Preemptible VMs are much cheaper than standard VMs, but they can be terminated at any time. By using checkpoints, you can ensure that your training job can be resumed if a preemptible VM is terminated. Also, even if training takes days, the checkpoints will prevent lossing the progress if preemtible VM are down.
upvoted 4 times
...
Liting
2 years, 4 months ago
Selected Answer: C
Optimize cost then should use kubeflow
upvoted 2 times
...
M25
2 years, 6 months ago
Selected Answer: C
Went with C
upvoted 1 times
...
CloudKida
2 years, 6 months ago
Selected Answer: C
https://cloud.google.com/ai-platform/prediction/docs/ai-explanations/overview AI Explanations helps you understand your model's outputs for classification and regression tasks. Whenever you request a prediction on AI Platform, AI Explanations tells you how much each feature in the data contributed to the predicted result. You can then use this information to verify that the model is behaving as expected, recognize bias in your models, and get ideas for ways to improve your model and your training data.
upvoted 1 times
...
_learner_
2 years, 6 months ago
Selected Answer: A
preemtible vm are valid for 24hrs. Hence training needs months to complete which is mentioned in question that makes A is answer.
upvoted 2 times
...
tavva_prudhvi
2 years, 7 months ago
Additionally, AI Platform's autoscaling feature can automatically adjust the number of resources used based on the workload, further optimizing costs.
upvoted 1 times
tavva_prudhvi
2 years, 7 months ago
I think it’s a. By using distributed training jobs with checkpoints, you can train your models on multiple GPUs simultaneously, which reduces the training time. Checkpoints allow you to save the progress of your training jobs regularly, so if the training job gets interrupted or fails, you can restart it from the last checkpoint instead of starting from scratch. This saves time and resources, which reduces costs. Additionally, AI Platform's autoscaling feature can automatically adjust the number of resources used based on the workload, further optimizing costs.
upvoted 1 times
...
...
John_Pongthorn
2 years, 9 months ago
C is out of date ? AI Platform is Vertex-AI ,so , this is a simple scenario that would accommodate infrastructure for this case.
upvoted 1 times
...
ares81
2 years, 10 months ago
Selected Answer: A
It's A.
upvoted 2 times
...
hiromi
2 years, 11 months ago
Selected Answer: C
It's seem C - https://www.kubeflow.org/docs/distributions/gke/pipelines/preemptible/ - https://cloud.google.com/optimization/docs/guide/checkpointing
upvoted 4 times
...
ares81
2 years, 11 months ago
"A Preemptible VM (PVM) is a Google Compute Engine (GCE) virtual machine (VM) instance that can be purchased for a steep discount as long as the customer accepts that the instance will terminate after 24 hours." This excludes C and D. Checkpoints are needed for long processing, so A.
upvoted 3 times
...
neochaotic
2 years, 11 months ago
Selected Answer: C
C - Reduce cost with preemptive instances and add checkpoints to snapshot intermediate results
upvoted 3 times
...
LearnSodas
2 years, 11 months ago
Selected Answer: A
Saving checkpoints avoids re-run from scratch
upvoted 2 times
...

Topic 1 Question 71

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 71 discussion

You need to train a regression model based on a dataset containing 50,000 records that is stored in BigQuery. The data includes a total of 20 categorical and numerical features with a target variable that can include negative values. You need to minimize effort and training time while maximizing model performance. What approach should you take to train this regression model?

  • A. Create a custom TensorFlow DNN model
  • B. Use BQML XGBoost regression to train the model.
  • C. Use AutoML Tables to train the model without early stopping.
  • D. Use AutoML Tables to train the model with RMSLE as the optimization objective.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
abneural
Highly Voted 2 years, 3 months ago
Selected Answer: B
Ans B. C --> No early stopping means longer training time D --> RMSLE metric need non-negative Y values
upvoted 5 times
...
Fer660
Most Recent 2 months, 2 weeks ago
Selected Answer: B
Not A: unnecessary effort B: correct. We might think of XGBoost as being primarily for classification, but it handles regression just as well. Low effort as everything is done within BQ. Not C: avoiding early stopping is a clear tell, as this goes against goal of minimizing training time. Not D: RMSLE uses logarithms, and the negative target values will be an issue here.
upvoted 2 times
...
gscharly
1 year ago
Selected Answer: B
Went with B
upvoted 2 times
...
M25
2 years ago
Selected Answer: B
Went with B
upvoted 1 times
...
John_Pongthorn
2 years, 3 months ago
Selected Answer: B
B and C is the most likely because of regression approach, But RMSLE it not allow you to take negative label to train as https://cloud.google.com/automl-tables/docs/evaluate#evaluation_metrics_for_regression_models RMSLE: The root-mean-squared logarithmic error metric is similar to RMSE, except that it uses the natural logarithm of the predicted and actual values plus 1. RMSLE penalizes under-prediction more heavily than over-prediction. It can also be a good metric when you don't want to penalize differences for large prediction values more heavily than for small prediction values. This metric ranges from zero to infinity; a lower value indicates a higher quality model. The RMSLE evaluation metric is returned only if all label and predicted values are non-negative.
upvoted 1 times
...
John_Pongthorn
2 years, 3 months ago
Selected Answer: D
BQML XGBoost ==> you have to take sql knowlege to write statement and B didn't mention how to get mx performance. Meanwhile AutoML you just click and select, click and select, click and select to get it done. and D refers to measurement to get maximizing model performance. you can minimize effort literally
upvoted 2 times
John_Pongthorn
2 years, 3 months ago
To john pongthorn , You are wrong 55555 it must be B genuinely
upvoted 2 times
...
...
zeic
2 years, 4 months ago
I recommend option D, Use AutoML Tables to train the model with RMSLE as the optimization objective. Using AutoML Tables to train the model can be a convenient and efficient way to minimize effort and training time while still maximizing model performance. In this case, using RMSLE as the optimization objective can be a good choice because it is a good fit for regression models with negative values in the target variable.
upvoted 2 times
...
MithunDesai
2 years, 4 months ago
Selected Answer: B
B is correct
upvoted 3 times
...
hiromi
2 years, 4 months ago
Selected Answer: B
Its seen B for me
upvoted 1 times
...
seifou
2 years, 5 months ago
Selected Answer: B
B is correct
upvoted 1 times
...
ares81
2 years, 5 months ago
It's B.
upvoted 1 times
...
YangG
2 years, 5 months ago
B. BigQuery is a keyword for me
upvoted 2 times
...

Topic 1 Question 72

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 72 discussion

You are building a linear model with over 100 input features, all with values between –1 and 1. You suspect that many features are non-informative. You want to remove the non-informative features from your model while keeping the informative ones in their original form. Which technique should you use?

  • A. Use principal component analysis (PCA) to eliminate the least informative features.
  • B. Use L1 regularization to reduce the coefficients of uninformative features to 0.
  • C. After building your model, use Shapley values to determine which features are the most informative.
  • D. Use an iterative dropout technique to identify which features do not degrade the model when removed.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
hiromi
Highly Voted 2 years, 4 months ago
Selected Answer: B
L1 regularization it's good for feature selection https://www.quora.com/How-does-the-L1-regularization-method-help-in-feature-selection https://developers.google.com/machine-learning/crash-course/regularization-for-sparsity/l1-regularization
upvoted 8 times
ailiba
2 years, 2 months ago
but this is not a sparse input vector, just a high dimensional vector where many features are not relevant.
upvoted 1 times
...
...
ares81
Highly Voted 2 years, 5 months ago
A. PCA reconfigures the features, so no. C. After building your model, so no. D. Dropout should be in the model and it doesn't tell us which features are informative or not. Big No! For me, it's B.
upvoted 6 times
...
b7ad1d9
Most Recent 1 month, 3 weeks ago
Selected Answer: B
L1 performs feature selection by shrinking weights of insignificant features to zero
upvoted 2 times
...
OpenKnowledge
2 months ago
Selected Answer: B
L1 regularizatiob is excellent for feature selection, especially when you have a large number of features or believe many are irrelevant.
upvoted 1 times
...
PhilipKoku
11 months, 1 week ago
Selected Answer: B
B) L1 Regularisation
upvoted 1 times
...
Liting
1 year, 10 months ago
Selected Answer: B
Went with B
upvoted 1 times
...
M25
2 years ago
Selected Answer: B
Went with B
upvoted 1 times
...
Antmal
2 years, 1 month ago
Selected Answer: B
L1 regularization penalises weights in proportion to the sum of the absolute value of the weights. L1 regularization helps drive the weights of irrelevant or barely relevant features to exactly 0. A feature with a weight of 0 is effectively removed from the model. https://developers.google.com/machine-learning/glossary#L1_regularization
upvoted 1 times
...
tavva_prudhvi
2 years, 1 month ago
Its B. See my explanations under the comments why its not C.
upvoted 1 times
...
enghabeth
2 years, 3 months ago
Selected Answer: B
it's a best way, becouse you reduce features non relevant in this case non-informatives
upvoted 1 times
...
behzadsw
2 years, 4 months ago
Selected Answer: A
The features must be removed from the model. They are not removed when doing L1 regularization. PCA is used prior to training.
upvoted 2 times
tavva_prudhvi
2 years, 1 month ago
That is a good point. PCA is a technique used to reduce the dimensionality of the dataset by transforming the original features into a new set of uncorrelated features. This can help to eliminate the least informative features and reduce the computational burden of building a model with many input features. However, it is important to note that PCA does not necessarily remove the original features from the model, but rather transforms them into a new set of features. On the other hand, L1 regularization can effectively remove the impact of non-informative features by setting their coefficients to 0 during the model building process. Therefore, both techniques can be useful for addressing the issue of non-informative features in a linear model, depending on the specific needs of the problem.
upvoted 1 times
...
jamesking1103
2 years, 4 months ago
should be A as keeping the informative ones in their original form
upvoted 3 times
libo1985
1 year, 7 months ago
How PCA can keep the original form?
upvoted 1 times
...
...
...
JeanEl
2 years, 5 months ago
Selected Answer: B
Agree with B
upvoted 2 times
...

Topic 1 Question 73

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 73 discussion

You work for a global footwear retailer and need to predict when an item will be out of stock based on historical inventory data Customer behavior is highly dynamic since footwear demand is influenced by many different factors. You want to serve models that are trained on all available data, but track your performance on specific subsets of data before pushing to production. What is the most streamlined and reliable way to perform this validation?

  • A. Use then TFX ModelValidator tools to specify performance metrics for production readiness.
  • B. Use k-fold cross-validation as a validation strategy to ensure that your model is ready for production.
  • C. Use the last relevant week of data as a validation set to ensure that your model is performing accurately on current data.
  • D. Use the entire dataset and treat the area under the receiver operating characteristics curve (AUC ROC) as the main metric.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
John_Pongthorn
Highly Voted 2 years, 3 months ago
https://www.tensorflow.org/tfx/guide/evaluator
upvoted 14 times
...
hiromi
Highly Voted 2 years, 4 months ago
Selected Answer: C
it's seem C for me B is wrong cuz "Many machine learning techniques don’t work well here due to the sequential nature and temporal correlation of time series. For example, k-fold cross validation can cause data leakage; models need to be retrained to generate new forecasts" - https://cloud.google.com/learn/what-is-time-series
upvoted 10 times
...
OpenKnowledge
Most Recent 4 weeks ago
Selected Answer: C
Footwear demand is influenced by many different factors. Timing of the year (Seasonality factor) should be a factor on the demand of Footware. So C should be the answer as it consider the timing factor. K-fold corss-validation is not suitable for Time series data.
upvoted 1 times
...
PhilipKoku
11 months, 1 week ago
Selected Answer: A
A) TFX ModelValidator is designed to handle the exact needs described in the scenario: training on all data, validating on specific subsets, and ensuring production readiness with comprehensive performance metrics. This makes it the most streamlined and reliable method compared to other options, which either lack specificity in production readiness (B), are too narrow in scope (C), or risk overfitting and inadequate validation (D).
upvoted 6 times
...
gscharly
1 year ago
Selected Answer: A
Evaluator TFX lets you evaluate the performance on different subsets of data https://www.tensorflow.org/tfx/guide/evaluator
upvoted 3 times
...
pinimichele01
1 year ago
Selected Answer: A
The Evaluator TFX pipeline component performs deep analysis on the training results for your models, to help you understand how your model performs on subsets of your data.
upvoted 3 times
...
edoo
1 year, 2 months ago
Selected Answer: A
I prefer A to C because 1 week of data may be insufficient to generalize the model and could lead to overfitting on the validation subset.
upvoted 4 times
...
pmle_nintendo
1 year, 2 months ago
Selected Answer: C
option C provides a streamlined and reliable approach that focuses on evaluating the model's performance on the most relevant and recent data, which is essential for predicting out-of-stock events in a dynamic retail setting.
upvoted 1 times
...
Mickey321
1 year, 5 months ago
Selected Answer: A
Either A or C but C is only last week which is not specific data sets
upvoted 2 times
...
AdiML
1 year, 7 months ago
Answer should be C, we are dealing with dynamic data and the "last" data is more relevant to have an idea about the future performance
upvoted 1 times
...
joaquinmenendez
1 year, 7 months ago
Selected Answer: C
Option C, because it allows you to track your model's performance on the most *recent* data, which is the most relevant data for predicting stockout risk. Given that the preferences are dynamic, the most important thing is that the model WORKS correctly with the newest data
upvoted 1 times
...
atlas_lyon
1 year, 9 months ago
Selected Answer: A
I will go for A. I don't think the aim of the question is to test if the candidates know whether or not a component is deprecated . Note that ModelValidator has been fused with Evaluator. So we can imagine, the question would have been updated in recent exams. Evaluator enables testing on specific subsets with the metrics we want, then indicates to Pusher component to push the new model to production if "model is good enough". This would make the pipeline quite streamlined (https://www.tensorflow.org/tfx/guide/evaluator) B: wrong: using historical data, one should watch data leakage C: wrong: We want to track performance on specific subsets of data (not necessarily the last week) maybe to do some targeting/segmentation ? who knows. D: wrong because we want to track performance on specific subsets of data not the entire dataset
upvoted 3 times
tavva_prudhvi
1 year, 9 months ago
Bro, thats not TFXModelValidator its Evaluator, are both the same?
upvoted 1 times
MultipleWorkerMirroredStrategy
1 year, 6 months ago
TFXModelValidator is deprecated, but its behaviour can be replicated using the Evaluator object - which is the point he tried to make. See the docs here: https://www.tensorflow.org/tfx/guide/modelval
upvoted 1 times
...
...
...
Liting
1 year, 10 months ago
Selected Answer: C
Went with C
upvoted 1 times
...
Voyager2
1 year, 11 months ago
Selected Answer: C
I think that it should be C for the following key point ", but track your performance on specific subsets of data before pushing to production" So the ask is which subset of data you should use.
upvoted 1 times
...
julliet
1 year, 11 months ago
Could someone explain why A is better option than C? C is correct one in terms of evaluation overall, no doubt. But do we choose TFX because it understands we are dealing with time series? Or is it the "specific subset" in the Q that makes us thinking we have already chosen the data of last period and just need to push it into the TFX?
upvoted 1 times
...
aw_49
1 year, 11 months ago
Selected Answer: C
A is deprecated.. so C
upvoted 1 times
...
M25
2 years ago
Selected Answer: A
Went with A
upvoted 3 times
...

Topic 1 Question 74

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 74 discussion

You have deployed a model on Vertex AI for real-time inference. During an online prediction request, you get an “Out of Memory” error. What should you do?

  • A. Use batch prediction mode instead of online mode.
  • B. Send the request again with a smaller batch of instances.
  • C. Use base64 to encode your data before using it for prediction.
  • D. Apply for a quota increase for the number of prediction requests.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
hiromi
Highly Voted 2 years, 4 months ago
Selected Answer: B
B is the answer 429 - Out of Memory https://cloud.google.com/ai-platform/training/docs/troubleshooting
upvoted 26 times
tavva_prudhvi
2 years, 1 month ago
Upvote this comment, its the right answer!
upvoted 4 times
...
...
PhilipKoku
Most Recent 11 months, 1 week ago
Selected Answer: B
B) Use smaller set of tokens
upvoted 1 times
...
pmle_nintendo
1 year, 2 months ago
Selected Answer: B
By reducing the batch size of instances sent for prediction, you decrease the memory footprint of each request, potentially alleviating the out-of-memory issue. However, be mindful that excessively reducing the batch size might impact the efficiency of your prediction process.
upvoted 1 times
...
M25
2 years ago
Selected Answer: B
Went with B
upvoted 1 times
...
tavva_prudhvi
2 years, 1 month ago
B. Send the request again with a smaller batch of instances. If you are getting an "Out of Memory" error during an online prediction request, it suggests that the amount of data you are sending in each request is too large and is exceeding the available memory. To resolve this issue, you can try sending the request again with a smaller batch of instances. This reduces the amount of data being sent in each request and helps avoid the out-of-memory error. If the problem persists, you can also try increasing the machine type or the number of instances to provide more resources for the prediction service.
upvoted 3 times
...
BenMS
2 years, 2 months ago
Selected Answer: C
This question is about prediction not training - and specifically it's about _online_ prediction (aka realtime serving). All the answers are about batch workloads apart from C.
upvoted 1 times
BenMS
2 years, 2 months ago
Okay, option D is also about online serving, but the error message indicates a problem for individual predictions, which will not be fixed by increasing the number of predictions per second.
upvoted 1 times
Antmal
2 years, 1 month ago
@BenMS this feels like a trick question.... makes on to zone to the word batch. https://cloud.google.com/ai-platform/training/docs/troubleshooting .... states then when an error occurs with an online prediction request, you usually get an HTTP status code back from the service. These are some commonly encountered codes and their meaning in the context of online prediction: 429 - Out of Memory The processing node ran out of memory while running your model. There is no way to increase the memory allocated to prediction nodes at this time. You can try these things to get your model to run: Reduce your model size by: 1. Using less precise variables. 2. Quantizing your continuous data. 3. Reducing the size of other input features (using smaller vocab sizes, for example). 4. Send the request again with a smaller batch of instances.
upvoted 3 times
...
...
OpenKnowledge
1 month, 1 week ago
Base64 encoding does not reduce the size of data; it actually increases it. Base64 encoding wouldn't help to address memory error
upvoted 1 times
...
...
koakande
2 years, 4 months ago
Selected Answer: B
https://cloud.google.com/ai-platform/training/docs/troubleshooting
upvoted 2 times
...
ares81
2 years, 5 months ago
The correct answer is B.
upvoted 1 times
...
LearnSodas
2 years, 5 months ago
Selected Answer: B
answer B as reported here: https://cloud.google.com/ai-platform/training/docs/troubleshooting
upvoted 1 times
...
Sivaram06
2 years, 5 months ago
Selected Answer: B
https://cloud.google.com/ai-platform/training/docs/troubleshooting#http_status_codes
upvoted 1 times
...

Topic 1 Question 75

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 75 discussion

You work at a subscription-based company. You have trained an ensemble of trees and neural networks to predict customer churn, which is the likelihood that customers will not renew their yearly subscription. The average prediction is a 15% churn rate, but for a particular customer the model predicts that they are 70% likely to churn. The customer has a product usage history of 30%, is located in New York City, and became a customer in 1997. You need to explain the difference between the actual prediction, a 70% churn rate, and the average prediction. You want to use Vertex Explainable AI. What should you do?

  • A. Train local surrogate models to explain individual predictions.
  • B. Configure sampled Shapley explanations on Vertex Explainable AI.
  • C. Configure integrated gradients explanations on Vertex Explainable AI.
  • D. Measure the effect of each feature as the weight of the feature multiplied by the feature value.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
PhilipKoku
11 months, 1 week ago
Selected Answer: B
B) Shapley
upvoted 2 times
...
pmle_nintendo
1 year, 2 months ago
Selected Answer: B
Sampled Shapley explanations offer a more sophisticated and model-agnostic method for understanding feature importance and contributions to predictions.
upvoted 3 times
...
adavid213
1 year, 6 months ago
Selected Answer: B
I agree, it seems like B
upvoted 1 times
...
NickHapton
1 year, 10 months ago
B refer: https://cloud.google.com/vertex-ai/docs/explainable-ai/overview#compare-methods
upvoted 2 times
...
M25
2 years ago
Selected Answer: B
Went with B
upvoted 2 times
...
CloudKida
2 years ago
Selected Answer: B
Assigns credit for the outcome to each feature, and considers different permutations of the features. This method provides a sampling approximation of exact Shapley values. shampled shapely recommended Model Type: Non-differentiable models, such as ensembles of trees and neural networks. https://cloud.google.com/ai-platform/prediction/docs/ai-explanations/overview
upvoted 2 times
...
enghabeth
2 years, 3 months ago
Selected Answer: B
Sampled Shapley works well for these models, which are meta-ensembles of trees and neural networks. https://cloud.google.com/vertex-ai/docs/explainable-ai/overview#sampled-shapley
upvoted 3 times
...
John_Pongthorn
2 years, 3 months ago
Selected Answer: B
B is optimal for tabular data Tree or DNN C integrated gradients explanations on Vertex Explainable AI. It is used for image.
upvoted 2 times
John_Pongthorn
2 years, 3 months ago
https://cloud.google.com/vertex-ai/docs/explainable-ai/overview#compare-methods
upvoted 4 times
...
...
ares81
2 years, 4 months ago
Selected Answer: B
It should be B.
upvoted 1 times
...
emma_aic
2 years, 4 months ago
Selected Answer: B
https://cloud.google.com/vertex-ai/docs/explainable-ai/overview#sampled-shapley
upvoted 2 times
...
egdiaa
2 years, 4 months ago
B - For sure as per GCP Docs here: https://cloud.google.com/vertex-ai/docs/explainable-ai/overview
upvoted 1 times
...
hiromi
2 years, 4 months ago
Selected Answer: B
B - https://christophm.github.io/interpretable-ml-book/shapley.html - https://cloud.google.com/vertex-ai/docs/explainable-ai/overview
upvoted 2 times
...
JeanEl
2 years, 5 months ago
Selected Answer: B
Agree with B : individual instance prediction + ensemble of trees and neural networks (recommended model types for Sampled Shapley : "Non-differentiable models, such as ensembles of trees and neural networks " ). Check out the link below : https://cloud.google.com/vertex-ai/docs/explainable-ai/overview
upvoted 3 times
...
YangG
2 years, 5 months ago
Selected Answer: C
it is about a individual instance prediction. I think use integrated gradient method
upvoted 2 times
...
ares81
2 years, 5 months ago
It seems D.
upvoted 1 times
...

Topic 1 Question 76

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 76 discussion

You are working on a classification problem with time series data. After conducting just a few experiments using random cross-validation, you achieved an Area Under the Receiver Operating Characteristic Curve (AUC ROC) value of 99% on the training data. You haven’t explored using any sophisticated algorithms or spent any time on hyperparameter tuning. What should your next step be to identify and fix the problem?

  • A. Address the model overfitting by using a less complex algorithm and use k-fold cross-validation.
  • B. Address data leakage by applying nested cross-validation during model training.
  • C. Address data leakage by removing features highly correlated with the target value.
  • D. Address the model overfitting by tuning the hyperparameters to reduce the AUC ROC value.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
hiromi
Highly Voted 2 years, 10 months ago
Selected Answer: B
B (same question 48) - https://towardsdatascience.com/time-series-nested-cross-validation-76adba623eb9
upvoted 5 times
...
pinimichele01
Highly Voted 1 year, 6 months ago
Selected Answer: B
random cross-validation time series data -> B
upvoted 5 times
...
OpenKnowledge
Most Recent 4 weeks ago
Selected Answer: B
Both random cross-validation and k-folds cross-validation are not suitable for time series data as those two can introduce data leakage leading to biased estimates/predictions. Nested cross-validation works with time series data avoiding data leakage and model overfitting
upvoted 1 times
...
hit_cloudie
5 months, 3 weeks ago
Selected Answer: C
C, see desertlotus1211
upvoted 1 times
...
desertlotus1211
9 months, 3 weeks ago
Selected Answer: C
You are working with time series data yet used random cross-validation, and you immediately achieved an extremely high AUC (99%) with little effort. This is a red flag for data leakage—meaning information from the future (or directly from the target) is leaking into the training process C is better answer
upvoted 3 times
...
gscharly
1 year, 7 months ago
Selected Answer: B
B with nested cross validation.
upvoted 2 times
pinimichele01
1 year, 6 months ago
can you explain me why?
upvoted 1 times
...
...
Werner123
1 year, 8 months ago
Selected Answer: B
"99% on training data" -> Data leakage "random cross-validation" -> Not suitable for time series, use "nested cross-validation"
upvoted 3 times
...
pmle_nintendo
1 year, 8 months ago
Selected Answer: D
Options B and C (Address data leakage by applying nested cross-validation during model training; Address data leakage by removing features highly correlated with the target value) are less relevant in this scenario because the primary concern appears to be overfitting rather than data leakage. Data leakage typically involves inadvertent inclusion of information from the test set in the training process, which may lead to overly optimistic performance metrics. However, there is no indication that data leakage is the cause of the high AUC ROC value in this case.
upvoted 1 times
503b759
12 months ago
Data leakage is occuring owing to the use of k-fold cross val, because of the time series nature of the data.
upvoted 1 times
...
...
pico
1 year, 12 months ago
Selected Answer: D
Options A and B also address overfitting, but they involve different strategies. Option A suggests using a less complex algorithm and k-fold cross-validation. While this can be effective, it might be premature to change the algorithm without first exploring hyperparameter tuning. Option B suggests addressing data leakage, which is a different issue and may not be the primary cause of overfitting in this scenario.
upvoted 3 times
...
humancomputation
2 years, 1 month ago
Selected Answer: B
B with nested cross validation.
upvoted 1 times
...
M25
2 years, 6 months ago
Selected Answer: B
Went with B
upvoted 2 times
...
BenMS
2 years, 8 months ago
Selected Answer: B
Nested cross-validation to reduce data leakage - same as a previous question.
upvoted 1 times
...
Alexarr6
2 years, 8 months ago
Selected Answer: B
It`s B
upvoted 1 times
...
ares81
2 years, 11 months ago
To say overfitting, I should have results on testing data, so it's data leakage. Common sense excludes C, so it's B.
upvoted 1 times
...

Topic 1 Question 77

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 77 discussion

You need to execute a batch prediction on 100 million records in a BigQuery table with a custom TensorFlow DNN regressor model, and then store the predicted results in a BigQuery table. You want to minimize the effort required to build this inference pipeline. What should you do?

  • A. Import the TensorFlow model with BigQuery ML, and run the ml.predict function.
  • B. Use the TensorFlow BigQuery reader to load the data, and use the BigQuery API to write the results to BigQuery.
  • C. Create a Dataflow pipeline to convert the data in BigQuery to TFRecords. Run a batch inference on Vertex AI Prediction, and write the results to BigQuery.
  • D. Load the TensorFlow SavedModel in a Dataflow pipeline. Use the BigQuery I/O connector with a custom function to perform the inference within the pipeline, and write the results to BigQuery.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
hiromi
Highly Voted 2 years, 4 months ago
Selected Answer: A
A should work with less effort - https://cloud.google.com/bigquery-ml/docs/making-predictions-with-imported-tensorflow-models#api - https://towardsdatascience.com/how-to-do-batch-predictions-of-tensorflow-models-directly-in-bigquery-ffa843ebdba6
upvoted 12 times
...
OpenKnowledge
Most Recent 4 weeks ago
Selected Answer: A
A is the answer for this case since minimized effort is one of the goal. But C is the best solution in general.
upvoted 1 times
...
desertlotus1211
9 months, 3 weeks ago
Selected Answer: D
BigQuery ML does not support importing arbitrary custom TensorFlow models for direct inference
upvoted 2 times
...
livewalk
11 months, 1 week ago
Selected Answer: D
BigQuery ML might not support custom TensorFlow DNN models directly.
upvoted 2 times
...
etienne0
1 year, 2 months ago
Selected Answer: C
Went with C
upvoted 1 times
...
pawan94
1 year, 4 months ago
Simplest doesn't mean it is the most effecient/optimal. If I follow the Best practices offered by Google for Serving / Inference Pipeline I would go with Vertex AI predictions. Read More for correct details : https://cloud.google.com/architecture/ml-on-gcp-best-practices#machine-learning-development
upvoted 2 times
etienne0
1 year, 2 months ago
Agreed, i'll also go with C.
upvoted 1 times
...
...
M25
2 years ago
Selected Answer: A
Went with A
upvoted 2 times
...
JamesDoe
2 years, 1 month ago
Selected Answer: A
https://cloud.google.com/bigquery-ml/docs/making-predictions-with-imported-tensorflow-models
upvoted 3 times
...
enghabeth
2 years, 3 months ago
Selected Answer: A
for this: https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-syntax-inference-overview Predict the label, either a numerical value for regression tasks or a categorical value for classification tasks on DNN regresion
upvoted 3 times
...
ares81
2 years, 5 months ago
ml.predict: https://cloud.google.com/bigquery-ml/docs/making-predictions-with-imported-tensorflow-models#api --> A
upvoted 2 times
...
LearnSodas
2 years, 5 months ago
Selected Answer: A
Answer A as the simplest
upvoted 2 times
...

Topic 1 Question 78

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 78 discussion

You are creating a deep neural network classification model using a dataset with categorical input values. Certain columns have a cardinality greater than 10,000 unique values. How should you encode these categorical values as input into the model?

  • A. Convert each categorical value into an integer value.
  • B. Convert the categorical string data to one-hot hash buckets.
  • C. Map the categorical variables into a vector of boolean values.
  • D. Convert each categorical value into a run-length encoded string.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
CloudKida
Highly Voted 2 years, 6 months ago
Selected Answer: B
https://cloud.google.com/ai-platform/training/docs/algorithms/wide-and-deep If the column is categorical with high cardinality, then the column is treated with hashing, where the number of hash buckets equals to the square root of the number of unique values in the column.
upvoted 5 times
...
PhilipKoku
Most Recent 1 year, 5 months ago
Selected Answer: B
B) Hash buckets
upvoted 2 times
...
etienne0
1 year, 8 months ago
Selected Answer: A
went with A
upvoted 1 times
...
M25
2 years, 6 months ago
Selected Answer: B
Went with B
upvoted 2 times
...
JamesDoe
2 years, 7 months ago
Selected Answer: B
B. The other options solves nada.
upvoted 1 times
...
enghabeth
2 years, 9 months ago
Selected Answer: B
https://towardsdatascience.com/getting-deeper-into-categorical-encodings-for-machine-learning-2312acd347c8 When you have millions uniques values try to do: Hash Encoding
upvoted 1 times
...
John_Pongthorn
2 years, 9 months ago
Selected Answer: B
B unconditoinally https://cloud.google.com/ai-platform/training/docs/algorithms/xgboost#analysis If the column is categorical with high cardinality, then the column is treated with hashing, where the number of hash buckets equals to the square root of the number of unique values in the column. A categorical column is considered to have high cardinality if the number of unique values is greater than the square root of the number of rows in the dataset.
upvoted 2 times
...
MithunDesai
2 years, 10 months ago
Selected Answer: C
I think C as it has 10000 categorical values
upvoted 2 times
...
hiromi
2 years, 10 months ago
Selected Answer: B
I think B is correct Ref.:" - https://cloud.google.com/ai-platform/training/docs/algorithms/xgboost - https://stackoverflow.com/questions/26473233/in-preprocessing-data-with-high-cardinality-do-you-hash-first-or-one-hot-encode
upvoted 4 times
hiromi
2 years, 10 months ago
- https://cloud.google.com/ai-platform/training/docs/algorithms/xgboost#analysis
upvoted 1 times
...
...
mil_spyro
2 years, 10 months ago
Selected Answer: B
Answer is B. When cardinality of the categorical column is very large best choice is binary encoding however it not here hence one-hot hash option.
upvoted 1 times
mil_spyro
2 years, 10 months ago
https://www.analyticsvidhya.com/blog/2020/08/types-of-categorical-data-encoding/
upvoted 1 times
...
...
JeanEl
2 years, 11 months ago
Selected Answer: B
Ans : B
upvoted 1 times
...
seifou
2 years, 11 months ago
Selected Answer: B
B is correct
upvoted 1 times
...
ares81
2 years, 11 months ago
It should be B
upvoted 1 times
...
LearnSodas
2 years, 11 months ago
Selected Answer: A
Answer A since with 10.000 unique values one-hot shouldn't be a good solution https://machinelearningmastery.com/how-to-prepare-categorical-data-for-deep-learning-in-python/
upvoted 3 times
etienne0
1 year, 8 months ago
I agree with A
upvoted 1 times
...
503b759
12 months ago
then you introduce ordinality into a categorical concept, which can mislead models
upvoted 1 times
...
...

Topic 1 Question 79

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 79 discussion

You need to train a natural language model to perform text classification on product descriptions that contain millions of examples and 100,000 unique words. You want to preprocess the words individually so that they can be fed into a recurrent neural network. What should you do?

  • A. Create a hot-encoding of words, and feed the encodings into your model.
  • B. Identify word embeddings from a pre-trained model, and use the embeddings in your model.
  • C. Sort the words by frequency of occurrence, and use the frequencies as the encodings in your model.
  • D. Assign a numerical value to each word from 1 to 100,000 and feed the values as inputs in your model.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
egdiaa
Highly Voted 2 years, 10 months ago
Answer is B: According to Google Docs here: - https://developers.google.com/machine-learning/guides/text-classification/ it is a Word Embedding case
upvoted 5 times
...
OpenKnowledge
Most Recent 4 weeks ago
Selected Answer: B
Word embedding is a natural language processing (NLP) technique that converts words into numerical vectors, allowing computers to understand semantic relationships between words. Instead of using simple numerical IDs, word embeddings place words with similar meanings close to each other in a multi-dimensional space, enabling machines to perform tasks like analogy solving and better understand context
upvoted 1 times
...
M25
2 years, 6 months ago
Selected Answer: B
Went with B
upvoted 2 times
...
John_Pongthorn
2 years, 9 months ago
Selected Answer: B
B https://developers.google.com/machine-learning/guides/text-classification/step-3 https://developers.google.com/machine-learning/guides/text-classification/step-4 i
upvoted 2 times
...
ares81
2 years, 10 months ago
Selected Answer: B
Answer is B
upvoted 1 times
...
hiromi
2 years, 10 months ago
Selected Answer: B
B (I'm not sure) - https://developers.google.com/machine-learning/guides/text-classification/step-3#label_vectorization - https://developers.google.com/machine-learning/guides/text-classification/step-4 - https://towardsai.net/p/deep-learning/text-classification-with-rnn - https://towardsdatascience.com/pre-trained-word-embedding-for-text-classification-end2end-approach-5fbf5cd8aead
upvoted 2 times
hiromi
2 years, 10 months ago
- https://developers.google.com/machine-learning/crash-course/embeddings/translating-to-a-lower-dimensional-space
upvoted 1 times
...
...
LearnSodas
2 years, 11 months ago
Selected Answer: C
Bag of words is a good practice to represent and feed text at a DNN https://machinelearningmastery.com/gentle-introduction-bag-words-model/
upvoted 1 times
503b759
12 months ago
probably BOW suffers from the high cardinality of the text (100k words). embeddings are typically lower dimensional (hundreds not thousands of columns)
upvoted 1 times
...
...

Topic 1 Question 80

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 80 discussion

You work for an online travel agency that also sells advertising placements on its website to other companies. You have been asked to predict the most relevant web banner that a user should see next. Security is important to your company. The model latency requirements are 300ms@p99, the inventory is thousands of web banners, and your exploratory analysis has shown that navigation context is a good predictor. You want to Implement the simplest solution. How should you configure the prediction pipeline?

  • A. Embed the client on the website, and then deploy the model on AI Platform Prediction.
  • B. Embed the client on the website, deploy the gateway on App Engine, deploy the database on Firestore for writing and for reading the user’s navigation context, and then deploy the model on AI Platform Prediction.
  • C. Embed the client on the website, deploy the gateway on App Engine, deploy the database on Cloud Bigtable for writing and for reading the user’s navigation context, and then deploy the model on AI Platform Prediction.
  • D. Embed the client on the website, deploy the gateway on App Engine, deploy the database on Memorystore for writing and for reading the user’s navigation context, and then deploy the model on Google Kubernetes Engine.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
hiromi
Highly Voted 2 years, 10 months ago
Selected Answer: C
C (same question 49) keywords the inventory is thousands of web banners -> Bigtable You want to Implement the simplest solution -> AI Platform Prediction
upvoted 11 times
tavva_prudhvi
2 years, 3 months ago
Yes, but in that question Option B doesnt have a database.Firestore can handle thousands of web banners, right?
upvoted 2 times
dija123
1 month, 1 week ago
Firestore is a scalable document database, but for the extremely low-latency reads required to meet a 300ms p99 SLA, It would struggle .
upvoted 2 times
...
...
...
e707
Highly Voted 2 years, 6 months ago
Selected Answer: B
Here are some of the reasons why C is not as simple as B: Cloud Bigtable is a more complex database to set up and manage than Firestore. Cloud Bigtable is not as secure as Firestore. Cloud Bigtable is not as well-integrated with other Google Cloud services as Firestore. Therefore, B is the simpler solution that meets all of the requirements.
upvoted 6 times
...
OpenKnowledge
Most Recent 4 weeks ago
Selected Answer: C
Bigtable is a highly scalable, low-latency NoSQL wide-column database service designed for massive analytical and operational workloads. It excels at handling large datasets with high read and write throughput.
upvoted 1 times
...
192malba192
1 year, 3 months ago
go for B
upvoted 1 times
...
pinimichele01
1 year, 7 months ago
Selected Answer: B
see e707
upvoted 1 times
...
ludovikush
1 year, 7 months ago
Selected Answer: C
as Hiromi said
upvoted 1 times
...
ludovikush
1 year, 8 months ago
Selected Answer: B
I would opt for B as we have requirement of retrieval latency
upvoted 1 times
...
Mickey321
1 year, 12 months ago
Selected Answer: B
Embed the client on the website, deploy the gateway on App Engine, and then deploy the model on AI Platform Prediction.
upvoted 1 times
...
Krish6488
1 year, 12 months ago
Selected Answer: B
I would go with Firestore as throughput or latency requirement provided in the question are possible with Firestore and bigTable may be an overkill. Had the scenario involved super large volumes of data, CBT would have taken precedence
upvoted 1 times
...
andresvelasco
2 years, 2 months ago
Selected Answer: B
I think B, based on "the simplest solution" consideration.
upvoted 1 times
...
tavva_prudhvi
2 years, 3 months ago
Selected Answer: B
the primary requirement mentioned in the original question is to implement the simplest solution. Firestore is a fully managed, serverless NoSQL database that can also handle thousands of web banners and dynamically changing user browsing history. It is designed for real-time data synchronization and can quickly update the most relevant web banner as the user browses different pages of the website. While Cloud Bigtable offers high performance and scalability, it is more complex to manage and is better suited for large-scale, high-throughput workloads. Firestore, on the other hand, is easier to implement and maintain, making it a more suitable choice for the simplest solution in this scenario.
upvoted 3 times
...
M25
2 years, 6 months ago
Selected Answer: C
Went with C
upvoted 2 times
...
lucaluca1982
2 years, 6 months ago
Selected Answer: B
B for me
upvoted 1 times
...
ares81
2 years, 10 months ago
Selected Answer: B
B, for me.
upvoted 2 times
...
kn29
2 years, 10 months ago
I think C because of latency requirements. Cloud BigTable has high latency feature from https://cloud.google.com/bigtable
upvoted 3 times
tavva_prudhvi
2 years, 3 months ago
correct that Cloud Bigtable can provide better latency compared to Firestore, especially when dealing with very large datasets and high-throughput workloads. However, it's important to consider the trade-offs and the specific use case. For the given scenario, the latency requirements are 300ms@p99, which Firestore can handle effectively for thousands of web banners and dynamically changing user browsing history. Firestore is designed for real-time data synchronization and can quickly update the most relevant web banner as the user browses different pages on the website. While Cloud Bigtable can offer improved latency, it comes with added complexity in terms of management and configuration. If the primary goal is to implement the simplest solution while meeting the latency requirements, Firestore remains a more suitable choice for this use case.
upvoted 1 times
...
...
ares81
2 years, 11 months ago
I need a DB to store the banners, so no A. We're talking of thousands of banners, so no C. Memorystore calls Redis, and other solutions, so no D. The answer is B, for me.
upvoted 1 times
...

Topic 1 Question 81

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 81 discussion

Your data science team has requested a system that supports scheduled model retraining, Docker containers, and a service that supports autoscaling and monitoring for online prediction requests. Which platform components should you choose for this system?

  • A. Vertex AI Pipelines and App Engine
  • B. Vertex AI Pipelines, Vertex AI Prediction, and Vertex AI Model Monitoring
  • C. Cloud Composer, BigQuery ML, and Vertex AI Prediction
  • D. Cloud Composer, Vertex AI Training with custom containers, and App Engine
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
John_Pongthorn
Highly Voted 2 years, 3 months ago
Selected Answer: B
The Cloud Compose may be good consideration if you are involved in getting Google Data Engineer Cert App enging is relevant to Dev-Op Cert Pls. if you know a bit about ML Google Cloud, we are preparing to take Google ML Cert, if there is no specifically particular requirement in the question. We must emphasize on use of Vertext AI as much as possible.
upvoted 9 times
...
PhilipKoku
Most Recent 11 months, 1 week ago
Selected Answer: B
B) Vertex AI Pipelines
upvoted 1 times
...
rosenr0
1 year, 11 months ago
B. Vertext AI also supports Docker container https://cloud.google.com/vertex-ai/docs/training/containers-overview
upvoted 2 times
...
CloudKida
2 years ago
Selected Answer: D
A custom container is a Docker image that you create to run your training application. By running your machine learning (ML) training job in a custom container, you can use ML frameworks, non-ML dependencies, libraries, and binaries that are not otherwise supported on Vertex AI. so we need vertex ai custom container for docker container. Thus option A and B are omitted . App Engine allows developers to focus on what they do best: writing code. Based on Compute Engine, the App Engine flexible environment automatically scales your app up and down while also balancing the load. Customizable infrastructure - App Engine flexible environment instances are Compute Engine virtual machines, which means that you can take advantage of custom libraries, use SSH for debugging, and deploy your own Docker containers.
upvoted 2 times
...
M25
2 years ago
Selected Answer: B
Went with B
upvoted 2 times
...
e707
2 years ago
Selected Answer: D
I think it's D. B does not support Docker containers, does it?
upvoted 1 times
e707
2 years ago
I can't change the voting but It's B.
upvoted 2 times
...
...
Sas02
2 years ago
Shouldn't it be A? https://cloud.google.com/appengine/docs/standard/scheduling-jobs-with-cron-yaml
upvoted 1 times
...
behzadsw
2 years, 4 months ago
Selected Answer: B
Vote for B
upvoted 1 times
...
hiromi
2 years, 4 months ago
Selected Answer: B
Vote for B
upvoted 3 times
...
mil_spyro
2 years, 4 months ago
Selected Answer: D
D is the only option that provides scheduled model retraining
upvoted 1 times
...
ares81
2 years, 5 months ago
Selected Answer: C
Serve Vertex AI Prediction, but the monitoring in the question is not the one of the answer B. (that is connected to the modeol). The correct answer is C.
upvoted 1 times
ares81
2 years, 4 months ago
I changed my mind. It's D.
upvoted 1 times
...
...
LearnSodas
2 years, 5 months ago
Selected Answer: B
Everything is possible on Vetex AI
upvoted 3 times
mil_spyro
2 years, 4 months ago
Scheduling is not possible without the Cloud Scheduler https://cloud.google.com/vertex-ai/docs/pipelines/schedule-cloud-scheduler
upvoted 2 times
hiromi
2 years, 4 months ago
I think Vertex AI Pipeline includes schedule/trigger runs, so my vote is B
upvoted 3 times
...
dija123
1 month, 1 week ago
Cloud Scheduler is advanced but you can Scheduler pipeline within vertex AI.
upvoted 1 times
...
...
...

Topic 1 Question 82

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 82 discussion

You are profiling the performance of your TensorFlow model training time and notice a performance issue caused by inefficiencies in the input data pipeline for a single 5 terabyte CSV file dataset on Cloud Storage. You need to optimize the input pipeline performance. Which action should you try first to increase the efficiency of your pipeline?

  • A. Preprocess the input CSV file into a TFRecord file.
  • B. Randomly select a 10 gigabyte subset of the data to train your model.
  • C. Split into multiple CSV files and use a parallel interleave transformation.
  • D. Set the reshuffle_each_iteration parameter to true in the tf.data.Dataset.shuffle method.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
pinimichele01
Highly Voted 1 year, 6 months ago
Selected Answer: C
Converting a large 5 terabyte CSV file to a TFRecord can be a time-consuming process, and you would still be dealing with a single large file.
upvoted 8 times
...
OpenKnowledge
Most Recent 1 month, 1 week ago
Selected Answer: C
The problem is asking about the very 1st step to try to resolve the issue. C should be the very 1st step to try. A should be the next step if C does not help
upvoted 2 times
...
b7ad1d9
1 month, 3 weeks ago
Selected Answer: A
TFRecord is better as a first step
upvoted 1 times
...
d83229d
2 months, 2 weeks ago
Selected Answer: A
I would go with A since even with Parallel Interleave and splitting, you would be reading text data. TFRecords is already parsed and a binary file, significantly faster to read.
upvoted 1 times
...
5091a99
8 months, 1 week ago
Selected Answer: A
This is a bad question. But imho, Answer: A. - TFRecords will improve read speeds with its binary format. Presumably the large file was there for a reason, possibly the output of a upstream process whose data may change in the future. TFRecords is a straightforward FIRST step as a part of a pipeline. - The other option is parallel interleave. Also improves read speeds, but not as straightforward as a first step and requires lots of file in the database that require version control.
upvoted 2 times
...
NamitSehgal
8 months, 2 weeks ago
A. Preprocessing your data into TFRecord format can significantly improve I/O performance and reduce the time spent on parsing and loading data, which is critical for optimizing the input pipeline for large-scale datasets.
upvoted 1 times
...
bc3f222
8 months, 3 weeks ago
Selected Answer: A
according to the official doc A, C seems to pre TFX solution
upvoted 1 times
...
phani49
10 months, 3 weeks ago
Selected Answer: A
Based on the official documentation, Option A (converting to TFRecord format) is actually the correct first action to try, and the claim is incorrect. Why TFRecord is the Best First Option TFRecord format is specifically recommended for large datasets because: - It provides extremely high throughput when reading from Cloud Storage, especially for large-scale training[2] - It's the recommended format for structured data and large files[2] - It's designed for efficient serialization of structured data and optimal performance with TensorFlow
upvoted 2 times
...
AB_C
11 months, 2 weeks ago
Selected Answer: C
c is the right answer
upvoted 1 times
...
Prakzz
1 year, 4 months ago
Selected Answer: A
Preprocessing the input CSV file into a TFRecord file optimizes the input data pipeline by enabling more efficient reading and processing. TFRecord is a binary format that is faster to read and more efficient for TensorFlow to process compared to CSV, which is a text-based format. This change can significantly reduce the time spent on data input operations during model training.
upvoted 4 times
...
PhilipKoku
1 year, 5 months ago
Selected Answer: A
A) Convert CSV file into TFRecord is more effecient and processing CSV in parallel (C)
upvoted 1 times
...
tavva_prudhvi
2 years ago
Selected Answer: C
While preprocessing the input CSV file into a TFRecord file (Option A) can improve the performance of your input pipeline, it is not the first action to try in this situation. Converting a large 5 terabyte CSV file to a TFRecord can be a time-consuming process, and you would still be dealing with a single large file.
upvoted 1 times
...
andresvelasco
2 years, 2 months ago
Selected Answer: C
i think C based on the consideration: "Which action should you try first ", meaning it should be less impactful to continue using CSV.
upvoted 1 times
...
TNT87
2 years, 5 months ago
Selected Answer: C
https://www.tensorflow.org/guide/data_performance#best_practice_summary
upvoted 2 times
...
M25
2 years, 6 months ago
Selected Answer: C
Went with C
upvoted 1 times
...
e707
2 years, 6 months ago
Selected Answer: C
Option A, preprocess the input CSV file into a TFRecord file, is not as good because it requires additional processing time. Hence, I think C is the best choice.
upvoted 1 times
...
frangm23
2 years, 6 months ago
Selected Answer: A
I think it could be A. https://cloud.google.com/architecture/best-practices-for-ml-performance-cost#preprocess_the_data_once_and_save_it_as_a_tfrecord_file
upvoted 1 times
...

Topic 1 Question 83

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 83 discussion

You need to design an architecture that serves asynchronous predictions to determine whether a particular mission-critical machine part will fail. Your system collects data from multiple sensors from the machine. You want to build a model that will predict a failure in the next N minutes, given the average of each sensor’s data from the past 12 hours. How should you design the architecture?

  • A. 1. HTTP requests are sent by the sensors to your ML model, which is deployed as a microservice and exposes a REST API for prediction
    2. Your application queries a Vertex AI endpoint where you deployed your model.
    3. Responses are received by the caller application as soon as the model produces the prediction.
  • B. 1. Events are sent by the sensors to Pub/Sub, consumed in real time, and processed by a Dataflow stream processing pipeline.
    2. The pipeline invokes the model for prediction and sends the predictions to another Pub/Sub topic.
    3. Pub/Sub messages containing predictions are then consumed by a downstream system for monitoring.
  • C. 1. Export your data to Cloud Storage using Dataflow.
    2. Submit a Vertex AI batch prediction job that uses your trained model in Cloud Storage to perform scoring on the preprocessed data.
    3. Export the batch prediction job outputs from Cloud Storage and import them into Cloud SQL.
  • D. 1. Export the data to Cloud Storage using the BigQuery command-line tool
    2. Submit a Vertex AI batch prediction job that uses your trained model in Cloud Storage to perform scoring on the preprocessed data.
    3. Export the batch prediction job outputs from Cloud Storage and import them into BigQuery.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
OpenKnowledge
1 month, 1 week ago
Selected Answer: B
Asynchronous processing does not mean batch processing. Asynchronous processing is a form of online processing; it involves receiving and acting on requests in real-time, rather than waiting for a batch of data to accumulate. In asynchronous processing, a system performs tasks without waiting for the operation to complete and return a result to the user, allowing for faster responses and better system resource utilization.
upvoted 1 times
...
PhilipKoku
11 months, 1 week ago
Selected Answer: B
B) Pub/Sub & DataFlow
upvoted 1 times
...
inc_dev_ml_001
1 year ago
Selected Answer: C
The simplest solution that can support an eventual batch prediction (triggered by pub/sub) even the semi-real time prediction.
upvoted 1 times
...
Werner123
1 year, 2 months ago
Selected Answer: B
Needs to be real time not batch. The data needs to be processed as a stream since multiple sensors are used. pawan94 is right. https://cloud.google.com/architecture/minimizing-predictive-serving-latency-in-machine-learning#online_real-time_prediction
upvoted 1 times
...
pawan94
1 year, 4 months ago
Here you go to the answer provided by google itself. I don't understand why would people use batch prediction when they its sensor data and online prediction is as well asynchronous. https://cloud.google.com/architecture/minimizing-predictive-serving-latency-in-machine-learning#offline_batch_prediction:~:text=Predictive%20maintenance%3A%20asynchronously%20predicting%20whether%20a%20particular%20machine%20part%20will%20fail%20in%20the%20next%20N%20minutes%2C%20given%20the%20averages%20of%20the%20sensor%27s%20data%20in%20the%20past%2030%20minutes.
upvoted 2 times
...
vale_76_na_xxx
1 year, 4 months ago
it refers to asincronou prediction I' go with C
upvoted 1 times
...
rosenr0
1 year, 11 months ago
Selected Answer: D
D. I think we have to query data from the past 12 hours for the prediction, and that's the reason for exporting the data to Cloud Storage. Also, the predictions don't have to be real time.
upvoted 2 times
...
M25
2 years ago
Selected Answer: B
Went with B
upvoted 1 times
...
JamesDoe
2 years, 1 month ago
Selected Answer: B
B. Online prediction, and need decoupling with Pub/Sub to make it asynchronous. Option A is synchronous.
upvoted 2 times
...
tavva_prudhvi
2 years, 1 month ago
Option C may not be the best choice for this use case because it involves using a batch prediction job in Vertex AI to perform scoring on preprocessed data. Batch prediction jobs are more suitable for scenarios where data is processed in batches, and results can be generated over a longer period, such as daily or weekly. In this use case, the requirement is to predict whether a machine part will fail in the next N minutes, given the average of each sensor's data from the past 12 hours. Therefore, real-time processing and prediction are necessary. Batch prediction jobs are not designed for real-time processing, and there may be a delay in receiving the predictions. Option B, on the other hand, is designed for real-time processing and prediction. The Pub/Sub and Dataflow components allow for real-time processing of incoming sensor data, and the trained ML model can be invoked for prediction in real-time. This makes it ideal for mission-critical applications where timely predictions are essential.
upvoted 2 times
...
tavva_prudhvi
2 years, 1 month ago
Its B, This architecture leverages the strengths of Pub/Sub, Dataflow, and Vertex AI. The system collects data from multiple sensors, which sends events to Pub/Sub. Pub/Sub can handle the high volume of incoming data and can buffer messages to prevent data loss. A Dataflow stream processing pipeline can consume the events in real-time and perform feature engineering and data preprocessing before invoking the trained ML model for prediction. The predictions are then sent to another Pub/Sub topic, where they can be consumed by a downstream system for monitoring. This architecture is highly scalable, resilient, and efficient, as it can handle large volumes of data and perform real-time processing and prediction. It also separates concerns by using a separate pipeline for data processing and another for prediction, making it easier to maintain and modify the system.
upvoted 1 times
...
enghabeth
2 years, 3 months ago
Selected Answer: B
if you have sensors inyour architecture.. you need pub/sub...
upvoted 1 times
...
John_Pongthorn
2 years, 3 months ago
Selected Answer: B
B is most likely . if you search asynchronous on this page. it appears in the question wants to focus on online prediction with asynchronous mode. https://cloud.google.com/architecture/minimizing-predictive-serving-latency-in-machine-learning#online_real-time_prediction and the question is the same as what has been explained in this section obviously. it is as below. Predictive maintenance: asynchronously predicting whether a particular machine part will fail in the next N minutes, given the averages of the sensor's data in the past 30 minutes. afte that, you can take a closer look at figure3 and read what it try to describle C and D it is the offline solution but you opt to use different tools. https://cloud.google.com/architecture/minimizing-predictive-serving-latency-in-machine-learning#offline_batch_prediction
upvoted 2 times
...
John_Pongthorn
2 years, 3 months ago
Asycnchromoue preciction = Batch prediction https://cloud.google.com/architecture/minimizing-predictive-serving-latency-in-machine-learning#offline_batch_prediction
upvoted 1 times
John_Pongthorn
2 years, 3 months ago
Asynchronous prediction = Batch prediction, It is incorrect because I am reckless to read this article, Admin can delete my shitty comment above. I was mistaken
upvoted 1 times
...
...
hiromi
2 years, 4 months ago
Selected Answer: B
B "Predictive maintenance: asynchronously predicting whether a particular machine part will fail in the next N minutes, given the averages of the sensor's data in the past 30 minutes." https://cloud.google.com/architecture/minimizing-predictive-serving-latency-in-machine-learning#offline_batch_prediction
upvoted 3 times
hiromi
2 years, 4 months ago
- https://cloud.google.com/architecture/minimizing-predictive-serving-latency-in-machine-learning#online_real-time_prediction
upvoted 1 times
...
...
mil_spyro
2 years, 4 months ago
Selected Answer: B
Answer is B. https://cloud.google.com/architecture/minimizing-predictive-serving-latency-in-machine-learning#handling_dynamic_real-time_features
upvoted 1 times
...
ares81
2 years, 5 months ago
Selected Answer: C
C, for me.
upvoted 1 times
...

Topic 1 Question 84

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 84 discussion

Your company manages an application that aggregates news articles from many different online sources and sends them to users. You need to build a recommendation model that will suggest articles to readers that are similar to the articles they are currently reading. Which approach should you use?

  • A. Create a collaborative filtering system that recommends articles to a user based on the user’s past behavior.
  • B. Encode all articles into vectors using word2vec, and build a model that returns articles based on vector similarity.
  • C. Build a logistic regression model for each user that predicts whether an article should be recommended to a user.
  • D. Manually label a few hundred articles, and then train an SVM classifier based on the manually classified articles that categorizes additional articles into their respective categories.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
b7ad1d9
1 month, 3 weeks ago
Selected Answer: B
The key word is "similarity", and there is not much focus on the user's past behavior (since this is a NEW user). This points to word2vec instead of collaborative filtering. Collab filtering is based on rich past history of user-item interaction
upvoted 3 times
...
gscharly
1 year ago
Selected Answer: B
Went with B
upvoted 1 times
...
M25
2 years ago
Selected Answer: B
Went with B
upvoted 1 times
...
TNT87
2 years ago
Selected Answer: B
https://cloud.google.com/blog/topics/developers-practitioners/meet-ais-multitool-vector-embeddings Answer B
upvoted 3 times
...
JamesDoe
2 years, 1 month ago
Selected Answer: B
Currently reading is the keyword here. Going to need B for that, A won't work since it would be based on e.g. all reading history and not the article currently being read.
upvoted 4 times
...
tavva_prudhvi
2 years, 1 month ago
Option A, creating a collaborative filtering system, may not be ideal for this use case because it relies on user behavior data, which may not be available or sufficient for new users or for users who have not interacted with the system much. Option C, building a logistic regression model for each user, may not be scalable because it requires building a separate model for each user, which can become difficult to manage as the number of users increases. Option D, manually labeling articles and training an SVM classifier, may not be as effective as the word2vec approach because it relies on manual labeling, which can be time-consuming and may not capture the full semantic meaning of the articles. Additionally, SVMs may not be as effective as neural network-based approaches like word2vec for capturing complex relationships between words and articles.
upvoted 3 times
...
JJJJim
2 years, 4 months ago
Selected Answer: B
word2vec can easily get similar articles, but the collaborative filter isn't sure well.
upvoted 2 times
...
hiromi
2 years, 4 months ago
Selected Answer: B
B https://towardsdatascience.com/recommending-news-articles-based-on-already-read-articles-627695221fe8
upvoted 3 times
...
mil_spyro
2 years, 4 months ago
Selected Answer: B
Answer B
upvoted 1 times
...
ares81
2 years, 5 months ago
Selected Answer: B
Collaborative looks at the other users, knowledge-based at me.Answer B is the most knowledge based, among these.
upvoted 2 times
...
YangG
2 years, 5 months ago
Selected Answer: A
"similar to they are currently reading". it should be a collaborative filtering problem
upvoted 2 times
taxberg
2 years, 3 months ago
No, Collaborative filtering recommends articles other people read that are not necessarily similar to what the person is reading. These people are chosen on being similar to the person in question, not the article.
upvoted 3 times
...
...
LearnSodas
2 years, 5 months ago
Selected Answer: B
Answer B
upvoted 2 times
...

Topic 1 Question 85

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 85 discussion

You work for a large social network service provider whose users post articles and discuss news. Millions of comments are posted online each day, and more than 200 human moderators constantly review comments and flag those that are inappropriate. Your team is building an ML model to help human moderators check content on the platform. The model scores each comment and flags suspicious comments to be reviewed by a human. Which metric(s) should you use to monitor the model’s performance?

  • A. Number of messages flagged by the model per minute
  • B. Number of messages flagged by the model per minute confirmed as being inappropriate by humans.
  • C. Precision and recall estimates based on a random sample of 0.1% of raw messages each minute sent to a human for review
  • D. Precision and recall estimates based on a sample of messages flagged by the model as potentially inappropriate each minute
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
hiromi
Highly Voted 2 years, 10 months ago
Selected Answer: D
D - https://cloud.google.com/natural-language/automl/docs/beginners-guide - https://cloud.google.com/vertex-ai/docs/text-data/classification/evaluate-model
upvoted 13 times
...
andresvelasco
Highly Voted 2 years, 2 months ago
Selected Answer: C
A. Number of messages flagged by the model per minute => NO, no measure of model performance B. Number of messages flagged by the model per minute confirmed as being inappropriate by humans.=> DONT THINK SO, because we need the total number of messages (flagged?) C. Precision and recall estimates based on a random sample of 0.1% of raw messages each minute sent to a human for review. => I think YES, because as I understand it that would be based on a sample of ALL messages not just the ones that have been flagged. D. Precision and recall estimates based on a sample of messages flagged by the model as potentially inappropriate each minute => I think NO, because the sample includes only flagged messages, meaning positives, so you cannot really measure recall.
upvoted 8 times
tavva_prudhvi
2 years ago
The main issue with option C is that it uses a random sample of only 0.1% of raw messages. This random sample might not contain enough examples of inappropriate content to accurately assess the model's performance. Since the majority of messages on the platform are likely appropriate, the random sample may not capture enough inappropriate content for a robust evaluation.
upvoted 5 times
josiejojo
9 months ago
But how can you calculate recall with just flagged samples? How could you get a view of false negatives? This is surely key to a problem like this where we don't want to let inappropriate posts go unflagged.
upvoted 2 times
...
...
...
OpenKnowledge
Most Recent 1 month, 1 week ago
Selected Answer: B
About C... too few and random samples -> possibility of not having enough samples (or even missing samples) for True Positive (i.e. correctly flagged), False Positive (i.e. incorrectly flagged) and False Negative (i.e. incorrectly NOT flagged). So, C is out. About D... only flagged samples are used. Although Precision can be calculated from those samples by using True Positive (i.e., correctly Flagged) and False Positive (i.e. Incorrectly flagged), Recall cannot be calculated from those flagged samples because Recall calculation needs False Negative (i.e., Incorrectly NOT flagged). So, D is out. So, B is the answer
upvoted 1 times
...
b7ad1d9
1 month, 3 weeks ago
Selected Answer: D
The gold standard answer is sampling flagged messages AND sampling a small number of unflagged messages to get both precision and recall. So option D + another step. Option C sounds sus because of the random 0.1% sampling percentage. For exam purposes, go with option D
upvoted 1 times
...
d6c984b
6 months, 2 weeks ago
Selected Answer: C
In order to estimate recall, you need to sample from ALL potential messages. The system is helping a team of 200 human reviewer who were previously performing the job manually. Even by sparing 1 human reviewer (hopefully in rotation), it's possible to review 0.5% of the original throughput capacity. Obviously, once the system is deployed and proves to be efficient, the team of 200 reviewers will shrink.
upvoted 1 times
...
phani49
10 months, 3 weeks ago
Selected Answer: C
C is correct: A random sample of raw messages provides an unbiased evaluation of the model's performance across all types of content Option D is problematic because: Creates a biased sample by only reviewing flagged messages Cannot detect false negatives (missed inappropriate content)
upvoted 2 times
...
amene
1 year, 1 month ago
Selected Answer: B
I went with B. Remember how to calculate Recall: TP/(TP+FN). Since "sample of messaged flagged by the model" are only P cases, you won't have your F cases reviewed by a human, therefore you won't have FN, therefore it's not D. I also believe that 0.1% of raw messages is going to have too little P cases, therefore not C. And then we remain with option B, which is not optimal, but it is the best we can do in this situation.
upvoted 1 times
...
baimus
1 year, 2 months ago
Selected Answer: C
It is absolutely not possible to calculate recall with D because we only have positives in the sample we need false negatives. Because of the high quantity of total data, 0.1% is fine, the answer is C
upvoted 1 times
...
ludovikush
1 year, 7 months ago
Selected Answer: D
Precision and recall are critical metrics for evaluating the performance of classification models, especially in contexts where both the accuracy of positive predictions (precision) and the ability to identify all positive instances (recall) are important. In this case: Precision (the proportion of messages flagged by the model as inappropriate that were actually inappropriate) helps ensure that the model minimizes the burden on human moderators by not flagging too many false positives, which could overwhelm them. Recall (the proportion of actual inappropriate messages that were correctly flagged by the model) ensures that the model is effective at catching as many inappropriate messages as possible, reducing the risk of harmful content being missed.
upvoted 5 times
...
etienne0
1 year, 8 months ago
Selected Answer: C
I go with C
upvoted 1 times
...
pmle_nintendo
1 year, 8 months ago
Selected Answer: D
Let's consider below hypothetical scenario: Total number of comments per minute: 10,000 Comments actually inappropriate: 500 If we use a random sample of only 0.1% of raw messages (10 comments) for evaluation, there's a high chance that this small sample may not include any or only a few inappropriate comments. As a result, the precision and recall estimates based on this sample may be skewed, leading to unreliable assessments of the model's performance. Thus, C is ruled out.
upvoted 3 times
...
Werner123
1 year, 8 months ago
Selected Answer: D
C does not make sense to me since it is a very small random sample. It is also only messages that have been sent to humans for review meaning that there is bias in that result set.
upvoted 2 times
...
b1a8fae
1 year, 10 months ago
D only caring for observations flagged by the model means we don't control for false negatives (approved actually inappropriate messages). B seems like a better option to me: the wording confuses me a bit, but I understand it as the true and false positives (human flagged comments and their modelled label)
upvoted 1 times
...
Mickey321
1 year, 12 months ago
Selected Answer: D
In favor of D
upvoted 2 times
...
pico
1 year, 12 months ago
Selected Answer: C
Given the context of content moderation, a balanced approach is often preferred. Therefore, option C, precision and recall estimates based on a random sample of raw messages, is a good choice. It provides a holistic view of the model's performance, taking into account both false positives (precision) and false negatives (recall), and it reflects how well the model is handling the entire dataset.
upvoted 1 times
...
Krish6488
1 year, 12 months ago
Selected Answer: D
A --> Conveys model'a activity levels but nit accuracy B --> Accuracy to some extend but wont give full picture as it does not account False negatives C --> Using a random sample of the raw messages allows you to estimate precision and recall for the overall activity, not just the flagged content. D --> Specifically measures on the subset of data that it flagged Both C & D work well in this case, but the specificity is higher in option D and hence will go with D
upvoted 2 times
...
Selected Answer: C
Google Cloud used to have a service called "continuous evaluation", where human labelers classify data to establish a ground truth. Thinking along those lines, the answer is C as it's the logical equivalent of that service. https://cloud.google.com/ai-platform/prediction/docs/continuous-evaluation
upvoted 1 times
...

Topic 1 Question 86

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 86 discussion

You are a lead ML engineer at a retail company. You want to track and manage ML metadata in a centralized way so that your team can have reproducible experiments by generating artifacts. Which management solution should you recommend to your team?

  • A. Store your tf.logging data in BigQuery.
  • B. Manage all relational entities in the Hive Metastore.
  • C. Store all ML metadata in Google Cloud’s operations suite.
  • D. Manage your ML workflows with Vertex ML Metadata.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
hiromi
Highly Voted 2 years, 10 months ago
Selected Answer: D
D - https://cloud.google.com/vertex-ai/docs/ml-metadata/tracking
upvoted 6 times
...
OpenKnowledge
Most Recent 1 month, 1 week ago
Selected Answer: D
Vertex ML Metadata providing a managed ML metadata store to track and analyze the entire lifecycle of a machine learning systems, allowing for better debugging, auditing, and performance comparison of ML systems and their artifacts. It enables to visualize the lineage of data, models, and pipelines, helping to understand how different components interact and what led to a particular outcome. This structured approach to metadata management improves reproducibility, transparency, and overall management of complex ML workflows on Google Cloud's Vertex AI platform.
upvoted 1 times
...
PJ_Exams
1 year, 4 months ago
Selected Answer: D
Correct
upvoted 2 times
...
SubbuJV
1 year, 9 months ago
Selected Answer: D
Selected Answer: D
upvoted 1 times
...
M25
2 years, 6 months ago
Selected Answer: D
Went with D
upvoted 1 times
...
enghabeth
2 years, 9 months ago
Selected Answer: D
totally D
upvoted 2 times
...
ares81
2 years, 11 months ago
Selected Answer: D
This should be an easy D.
upvoted 3 times
...
LearnSodas
2 years, 11 months ago
Selected Answer: D
https://codelabs.developers.google.com/vertex-mlmd-pipelines?hl=id&authuser=6#0
upvoted 3 times
...

Topic 1 Question 87

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 87 discussion

You have been given a dataset with sales predictions based on your company’s marketing activities. The data is structured and stored in BigQuery, and has been carefully managed by a team of data analysts. You need to prepare a report providing insights into the predictive capabilities of the data. You were asked to run several ML models with different levels of sophistication, including simple models and multilayered neural networks. You only have a few hours to gather the results of your experiments. Which Google Cloud tools should you use to complete this task in the most efficient and self-serviced way?

  • A. Use BigQuery ML to run several regression models, and analyze their performance.
  • B. Read the data from BigQuery using Dataproc, and run several models using SparkML.
  • C. Use Vertex AI Workbench user-managed notebooks with scikit-learn code for a variety of ML algorithms and performance metrics.
  • D. Train a custom TensorFlow model with Vertex AI, reading the data from BigQuery featuring a variety of ML algorithms.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Werner123
Highly Voted 1 year, 2 months ago
Selected Answer: A
You only have a few hours. The dataset is in BQ. The dataset is carefully managed. BQML it is.
upvoted 8 times
...
OpenKnowledge
Most Recent 1 month, 1 week ago
Selected Answer: A
BigQuery ML models can integrate with Vertex AI and leverages Vertex AI Experiments and Vertex AI Explainable. The Vertex AI Explainable allows user to gain insights into the predictions made by their BigQuery ML models.
upvoted 2 times
...
bc3f222
8 months, 3 weeks ago
Selected Answer: A
not enough time, data already in big query therefore BQML
upvoted 1 times
...
ludovikush
1 year, 2 months ago
Selected Answer: C
I agree with pico answer
upvoted 1 times
...
iieva
1 year, 3 months ago
Selected Answer: A
All deep neural networks are multilayered neural networks, but not all multilayered neural networks are necessarily deep. The term "deep" is used to emphasize the depth of the network in the context of having many hidden layers, which has been shown to be effective for learning hierarchical representations of complex patterns in data. Hence BQ allows creation of DNNs (https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create-dnn-models) it should be A.
upvoted 4 times
...
pico
1 year, 8 months ago
Selected Answer: C
Vertex AI Workbench provides user-managed notebooks that allow you to run Python code using libraries like scikit-learn, TensorFlow, and more. You can easily connect to your BigQuery dataset from within the notebook, extract the data, and perform data preprocessing. You can then experiment with different ML algorithms available in scikit-learn and track performance metrics. It provides flexibility, control, and the ability to run various models quickly.
upvoted 3 times
pico
1 year, 8 months ago
Not A. BigQuery ML is convenient for quick model training and predictions within BigQuery itself, but it has limitations in terms of the variety of ML algorithms and customization options it offers. It may not be the best choice for running more sophisticated ML models or extensive experiments. and It only said regression model
upvoted 2 times
...
...
MTTTT
1 year, 9 months ago
Selected Answer: C
I think multilayered neural networks need to be trained externally from BQ ML as stated here: https://cloud.google.com/bigquery/docs/bqml-introduction
upvoted 1 times
MTTTT
1 year, 9 months ago
nvm you can import DNN in BQ
upvoted 1 times
...
...
SamuelTsch
1 year, 10 months ago
Selected Answer: A
According to the question, you don't have enough time. B, C, D need much more time to set up the service, or write the code. Also the data is already in BigQuery. BQML should be the fastest way. Besides, BQML supports xgboost, NN models as well.
upvoted 2 times
...
Jarek7
1 year, 10 months ago
Selected Answer: C
The question says that "You were asked to run several ML models with different levels of sophistication, including simple models and multilayered neural networks" BQ ML doesn't allow this. BQ ML provides only simple regression/categorization models. It is not about training these "sophisticated models" but only run them, so you can easly do it within few hours with notebooks.
upvoted 2 times
...
M25
2 years ago
Selected Answer: A
Went with A
upvoted 2 times
...
lucaluca1982
2 years ago
Selected Answer: C
C allows to execute more complex tests
upvoted 1 times
tavva_prudhvi
1 year, 9 months ago
However, given the limited time constraint of a few hours and the fact that the data is already stored in BigQuery, option A is more efficient. BigQuery ML allows you to quickly create and evaluate ML models directly within BigQuery, without the need to move the data or set up a separate environment. This makes it faster and more convenient for running several regression models and analyzing their performance within the given time frame.
upvoted 1 times
...
...
FherRO
2 years, 2 months ago
Selected Answer: A
B,C,D requires coding. You only have some hours, A is the fastest.
upvoted 2 times
...
hiromi
2 years, 4 months ago
Selected Answer: A
I vote for A
upvoted 3 times
...
ares81
2 years, 5 months ago
Selected Answer: A
It's A.
upvoted 2 times
...
LearnSodas
2 years, 5 months ago
Selected Answer: A
I will go with A, since it's the fastest way to do it. Custom training in Vertex AI requires time and writing scikit-learn models in notebooks too
upvoted 2 times
...

Topic 1 Question 88

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 88 discussion

You are an ML engineer at a bank. You have developed a binary classification model using AutoML Tables to predict whether a customer will make loan payments on time. The output is used to approve or reject loan requests. One customer’s loan request has been rejected by your model, and the bank’s risks department is asking you to provide the reasons that contributed to the model’s decision. What should you do?

  • A. Use local feature importance from the predictions.
  • B. Use the correlation with target values in the data summary page.
  • C. Use the feature importance percentages in the model evaluation page.
  • D. Vary features independently to identify the threshold per feature that changes the classification.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
shankalman717
Highly Voted 1 year, 2 months ago
Selected Answer: A
To access local feature importance in AutoML Tables, you can use the "Explain" feature, which shows the contribution of each feature to the prediction for a specific example. This will help you identify the most important features that contributed to the loan request being rejected. Option B, using the correlation with target values in the data summary page, may not provide the most accurate explanation as it looks at the overall correlation between the features and target variable, rather than the contribution of each feature to a specific prediction. Option C, using the feature importance percentages in the model evaluation page, may not provide a sufficient explanation for the specific prediction, as it shows the importance of each feature across all predictions, rather than for a specific prediction. Option D, varying features independently to identify the threshold per feature that changes the classification, is not recommended as it can be time-consuming and does not provide a clear explanation for why the loan request was rejected
upvoted 12 times
...
JamesDoe
Highly Voted 1 year, 1 month ago
Selected Answer: A
Local, not global since they asked about one specific prediction. Check out that section on this blog: https://cloud.google.com/blog/products/ai-machine-learning/explaining-model-predictions-structured-data/ Cool stuff!
upvoted 5 times
...
M25
Most Recent 1 year ago
Selected Answer: A
Went with A
upvoted 1 times
...
tavva_prudhvi
1 year, 1 month ago
Local feature importance can provide insight into the specific features that contributed to the model's decision for a particular instance. This information can be used to explain the model's decision to the bank's risks department and potentially identify any issues or biases in the model. Option B is not applicable as the loan request has already been rejected by the model, so there are no target values to correlate with. Option C may provide some insights, but local feature importance will provide more specific information for this particular instance. Option D involves changing the features, which may not be feasible or ethical in this case.
upvoted 2 times
...
Yajnas_arpohc
1 year, 1 month ago
C seems more apt & exhaustive to explain for bank's purpose; it uses various Feature Attribution methods. A explains how much each feature added to or subtracted from the result as compared with the baseline prediction score; indicative, but less optimal for the purpose at hand
upvoted 1 times
...
enghabeth
1 year, 3 months ago
Selected Answer: A
it's think is more easy to explain with feature importance
upvoted 2 times
...
ares81
1 year, 4 months ago
Selected Answer: C
AutoML Tables tells you how much each feature impacts this model. It is shown in the Feature importance graph. The values are provided as a percentage for each feature: the higher the percentage, the more strongly that feature impacted model training. C.
upvoted 1 times
...
hiromi
1 year, 4 months ago
Selected Answer: A
A https://cloud.google.com/automl-tables/docs/explain#local
upvoted 2 times
...
mil_spyro
1 year, 4 months ago
Selected Answer: A
Agree with A. "Local feature importance gives you visibility into how the individual features in a specific prediction request affected the resulting prediction. Each local feature importance value shows only how much the feature affected the prediction for that row. To understand the overall behavior of the model, use model feature importance." https://cloud.google.com/automl-tables/docs/explain#local
upvoted 4 times
...
ares81
1 year, 5 months ago
Selected Answer: C
"Feature importance: AutoML Tables tells you how much each feature impacts this model. It is shown in the Feature importance graph. The values are provided as a percentage for each feature: the higher the percentage, the more strongly that feature impacted model training." The correct answer is C.
upvoted 1 times
tavva_prudhvi
1 year, 1 month ago
Can you tell the feature importance for a specific prediction?
upvoted 2 times
...
...
YangG
1 year, 5 months ago
Selected Answer: A
Should be A. it is specific to this example. so use local feature importance
upvoted 2 times
...
ares81
1 year, 5 months ago
It seems C, to me.
upvoted 1 times
...

Topic 1 Question 89

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 89 discussion

You work for a magazine distributor and need to build a model that predicts which customers will renew their subscriptions for the upcoming year. Using your company’s historical data as your training set, you created a TensorFlow model and deployed it to AI Platform. You need to determine which customer attribute has the most predictive power for each prediction served by the model. What should you do?

  • A. Use AI Platform notebooks to perform a Lasso regression analysis on your model, which will eliminate features that do not provide a strong signal.
  • B. Stream prediction results to BigQuery. Use BigQuery’s CORR(X1, X2) function to calculate the Pearson correlation coefficient between each feature and the target variable.
  • C. Use the AI Explanations feature on AI Platform. Submit each prediction request with the ‘explain’ keyword to retrieve feature attributions using the sampled Shapley method.
  • D. Use the What-If tool in Google Cloud to determine how your model will perform when individual features are excluded. Rank the feature importance in order of those that caused the most significant performance drop when removed from the model.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
SubbuJV
Highly Voted 1 year, 2 months ago
Selected Answer: C
Vertex AI Explanations went with C
upvoted 5 times
...
NamitSehgal
Most Recent 8 months, 3 weeks ago
Selected Answer: C
Feature Attributions for Individual Predictions Direct and Efficient
upvoted 1 times
...
M25
2 years ago
Selected Answer: C
Went with C
upvoted 2 times
...
CloudKida
2 years ago
Selected Answer: C
https://cloud.google.com/ai-platform/prediction/docs/ai-explanations/overview AI Explanations helps you understand your model's outputs for classification and regression tasks. Whenever you request a prediction on AI Platform, AI Explanations tells you how much each feature in the data contributed to the predicted result.
upvoted 1 times
...
Yajnas_arpohc
2 years, 1 month ago
Key words in question "for each prediction served" - that make its C D is more of a broader analysis activity
upvoted 3 times
...
John_Pongthorn
2 years, 3 months ago
Selected Answer: C
You have to use a flagship native service as much as possible.
upvoted 1 times
...
hiromi
2 years, 4 months ago
Selected Answer: D
I vote for D - https://www.tensorflow.org/tensorboard/what_if_tool - https://pair-code.github.io/what-if-tool/ - https://medium.com/red-buffer/tensorflows-what-if-tool-c52914ea215c C is wrong cuz AI Explanation dosen't work for TensorFlow models (https://cloud.google.com/vertex-ai/docs/explainable-ai/overview)
upvoted 1 times
hiromi
2 years, 4 months ago
Sorry, i tink C is the answer
upvoted 3 times
...
b7ad1d9
1 month, 3 weeks ago
Vertex AI Explainability does work with custom TF models. Both are Google products!
upvoted 1 times
...
mil_spyro
2 years, 4 months ago
This is from the doc you provided: "Feature attribution is supported for all types of models (both AutoML and custom-trained), frameworks (TensorFlow, scikit, XGBoost), and modalities (images, text, tabular, video)." https://cloud.google.com/vertex-ai/docs/explainable-ai/overview#supported_model_types_2
upvoted 3 times
hiromi
2 years, 4 months ago
Sorry, I mean Shapley method doesn't support TensorFlow Models See https://cloud.google.com/vertex-ai/docs/explainable-ai/overview#compare-methods
upvoted 1 times
...
hiromi
2 years, 4 months ago
Sorry, i tink C is the answer. Tks
upvoted 2 times
...
...
...
mil_spyro
2 years, 4 months ago
Selected Answer: C
AI Explanations provides feature attributions using the sampled Shapley method, which can help you understand how much each feature contributes to a model's prediction.
upvoted 4 times
...
ares81
2 years, 5 months ago
Selected Answer: C
AI Explanations helps you understand your model's outputs for classification and regression tasks. Whenever you request a prediction on AI Platform, AI Explanations tells you how much each feature in the data contributed to the predicted result." It's C!
upvoted 2 times
...
JeanEl
2 years, 5 months ago
Selected Answer: C
Agree with C
upvoted 2 times
...

Topic 1 Question 90

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 90 discussion

You are working on a binary classification ML algorithm that detects whether an image of a classified scanned document contains a company’s logo. In the dataset, 96% of examples don’t have the logo, so the dataset is very skewed. Which metrics would give you the most confidence in your model?

  • A. F-score where recall is weighed more than precision
  • B. RMSE
  • C. F1 score
  • D. F-score where precision is weighed more than recall
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
tavva_prudhvi
Highly Voted 2 years, 7 months ago
Selected Answer: A
In this scenario, the dataset is highly imbalanced, where most of the examples do not have the company's logo. Therefore, accuracy could be misleading as the model can have high accuracy by simply predicting that all images do not have the logo. F1 score is a good metric to consider in such cases, as it takes both precision and recall into account. However, since the dataset is highly skewed, we should weigh recall more than precision to ensure that the model is correctly identifying the images that do have the logo. Therefore, F-score where recall is weighed more than precision is the best metric to evaluate the performance of the model in this scenario. Option B (RMSE) is not applicable to this classification problem, and option D (F-score where precision is weighed more than recall) is not suitable for highly skewed datasets.
upvoted 16 times
...
egdiaa
Highly Voted 2 years, 10 months ago
Answer C: F1-Score is the best for imbalanced Data like this case: https://stephenallwright.com/imbalanced-data-metric/
upvoted 5 times
...
Ankit267
Most Recent 10 months, 2 weeks ago
Selected Answer: C
Depending on the +ve class, among A & D, more likely A though most important thing is penalty for misclassifying which s missing therefore going with F1, the safest choice
upvoted 1 times
...
jkkim_jt
1 year ago
Selected Answer: C
F1-score: Harmonic mean of precision and recall. It ensures that both fasle positive and false negatives are considerd. F1- focusing on recall may be useful if missing a logo is more costly than incorrectly identifying one.
upvoted 2 times
...
gscharly
1 year, 6 months ago
Selected Answer: C
I'd go with C. We don't know which option (less FP or less FN) is most important for business with the provided information, so we should seek a balance.
upvoted 3 times
...
etienne0
1 year, 8 months ago
Selected Answer: D
I think it's D.
upvoted 1 times
...
guilhermebutzke
1 year, 9 months ago
Selected Answer: D
I think it could be D, but the question does not provide enough information for this. I have this feeling: If 4% have the logo, we are looking just for these ones, right? So, the 'quality of TP,' that's it, the precision, could be more interesting because we want a model that we can rely on. So, when this model Predict a image with logo, we`ll be more certain about it. If we use recall, for example, a model with 99% recall has more chance of getting the logo, but we won't have quality in this. This model could suggest a lot of images without logo. It is better to use any ML than this...
upvoted 3 times
...
pico
1 year, 12 months ago
Selected Answer: C
both option A (F-score with higher weight on recall) and option C (F1 score) could be suitable depending on the specific priorities and requirements of your classification problem. If missing a company's logo is considered more problematic than having false alarms, then option A might be preferred. The F1 score (option C) is a balanced measure that considers both precision and recall, which is generally a good choice in imbalanced datasets. Ultimately, the choice between option A and option C depends on the specific goals and constraints of your application.
upvoted 3 times
...
Mickey321
1 year, 12 months ago
Selected Answer: C
The question not have clear preference for recall or precision hence going with C
upvoted 4 times
...
Jarek7
2 years, 4 months ago
Selected Answer: C
Yeah, I know - everyone is voting A... To be honest I still don't understand why are you more affraid of these few FNs than FPs. In my opinion they are exactly same evil. Every documantation says that F1 is great on skewed data. You should use weighted F1 when you know what is worse for you FNs or FPs. In this case we have no any hints on it, so I would stay with ordinary F1.
upvoted 5 times
...
Voyager2
2 years, 5 months ago
Selected Answer: A
A. F-score where recall is weighed more than precision Even a model which always says that don't have the logo will have a good precision because is the most common. What we need is improve recall.
upvoted 2 times
...
M25
2 years, 6 months ago
Selected Answer: A
Went with A
upvoted 2 times
...
guilhermebutzke
2 years, 8 months ago
Selected Answer: A
I think is A. The positive Class is the minority. So, it's more important to correctly detect logos in all images that have logo (recall) than correctly detect logos in images classified with logos (precision).
upvoted 3 times
...
enghabeth
2 years, 9 months ago
Selected Answer: A
I think is D becouse u try detect TP then it's more important recall than precision
upvoted 3 times
...
ares81
2 years, 10 months ago
Selected Answer: A
Answer A is my choice.
upvoted 1 times
...
Abhijat
2 years, 10 months ago
A is correct
upvoted 1 times
...
Dataspire
2 years, 10 months ago
Selected Answer: A
less logo images. Recall should be weighted more
upvoted 4 times
...

Topic 1 Question 91

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 91 discussion

You work on the data science team for a multinational beverage company. You need to develop an ML model to predict the company’s profitability for a new line of naturally flavored bottled waters in different locations. You are provided with historical data that includes product types, product sales volumes, expenses, and profits for all regions. What should you use as the input and output for your model?

  • A. Use latitude, longitude, and product type as features. Use profit as model output.
  • B. Use latitude, longitude, and product type as features. Use revenue and expenses as model outputs.
  • C. Use product type and the feature cross of latitude with longitude, followed by binning, as features. Use profit as model output.
  • D. Use product type and the feature cross of latitude with longitude, followed by binning, as features. Use revenue and expenses as model outputs.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
hiromi
Highly Voted 2 years, 4 months ago
Selected Answer: C
C (not sure) - https://developers.google.com/machine-learning/crash-course/feature-crosses/video-lecture - https://developers.google.com/machine-learning/crash-course/regularization-for-sparsity/l1-regularization
upvoted 7 times
...
tavva_prudhvi
Highly Voted 2 years, 1 month ago
Selected Answer: C
Option C is the best option because it takes into account both the product type and location, which can affect profitability. Binning the feature cross of latitude and longitude can help capture the nonlinear relationship between location and profitability, and using profit as the model output is appropriate because it's the target variable we want to predict.
upvoted 5 times
...
sonicclasps
Most Recent 1 year, 3 months ago
Selected Answer: D
the question asks to predict profitability , not profit. profitability is calculated from revenue and expenses. the correct answer is D
upvoted 3 times
...
andresvelasco
1 year, 7 months ago
Most people have chosen C but: Does it make sense to do binning after feature cross? Isnt it the other way around?
upvoted 2 times
maukaba
1 year, 6 months ago
I agree it is the way around. See example: https://developers.google.com/machine-learning/crash-course/feature-crosses/check-your-understanding One feature cross: [binned latitude X binned longitude X binned roomsPerPerson]
upvoted 1 times
maukaba
1 year, 6 months ago
In the following examples it is said that it is not possible to cross lat & lon without bucketized them before since continous values must be converted into discrete before crossing : https://www.kaggle.com/code/vikramtiwari/feature-crosses-tensorflow-mlcc
upvoted 1 times
...
...
...
M25
2 years ago
Selected Answer: C
Went with C
upvoted 1 times
...
abneural
2 years, 2 months ago
Selected Answer: C
Agreeing with hiromi, taxberg Feature cross and bucket lat and lon on geographical problems
upvoted 1 times
...
enghabeth
2 years, 3 months ago
Selected Answer: C
your output is profit
upvoted 1 times
...
taxberg
2 years, 3 months ago
Selected Answer: C
Must be C. Always feature cross lat and lon on geographical problems. Also, D can not be right as we do not have revenue in the dataset.
upvoted 3 times
...
mil_spyro
2 years, 4 months ago
Selected Answer: A
In this case, there is no need to reduce the number of unique values in the latitude and longitude variables, and binning would reduce information from those features hence A
upvoted 2 times
hiromi
2 years, 4 months ago
Why no need to reduce?
upvoted 1 times
...
mil_spyro
2 years, 4 months ago
binding and crossing*
upvoted 1 times
...
...
ares81
2 years, 5 months ago
Selected Answer: C
Easy C.
upvoted 2 times
...

Topic 1 Question 92

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 92 discussion

You work as an ML engineer at a social media company, and you are developing a visual filter for users’ profile photos. This requires you to train an ML model to detect bounding boxes around human faces. You want to use this filter in your company’s iOS-based mobile phone application. You want to minimize code development and want the model to be optimized for inference on mobile phones. What should you do?

  • A. Train a model using AutoML Vision and use the “export for Core ML” option.
  • B. Train a model using AutoML Vision and use the “export for Coral” option.
  • C. Train a model using AutoML Vision and use the “export for TensorFlow.js” option.
  • D. Train a custom TensorFlow model and convert it to TensorFlow Lite (TFLite).
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
pshemol
Highly Voted 1 year, 10 months ago
Selected Answer: A
https://cloud.google.com/vision/automl/docs/export-edge Core ML -> iOS and macOS Coral -> Edge TPU-based device TensorFlow.js -> web
upvoted 19 times
maukaba
1 year ago
Updated Vertex AI link:https://cloud.google.com/vertex-ai/docs/export/export-edge-model Trained AutoML Edge image classification models can be exported in the following formats: TF Lite - to run your model on edge or mobile devices. Edge TPU TF Lite - to run your model on Edge TPU devices. Container - to run on a Docker container. Core ML - to run your model on iOS and macOS devices. Tensorflow.js - to run your model in the browser and in Node.js.
upvoted 6 times
...
...
M25
Most Recent 1 year, 6 months ago
Selected Answer: A
Went with A
upvoted 1 times
...
TNT87
1 year, 6 months ago
https://developer.apple.com/documentation/coreml Answer A
upvoted 1 times
TNT87
1 year, 6 months ago
https://cloud.google.com/vertex-ai/docs/export/export-edge-model#export
upvoted 1 times
...
...
shankalman717
1 year, 8 months ago
Selected Answer: B
AutoML Vision is a service provided by Google Cloud that enables developers to train and deploy machine learning models for image recognition tasks, such as detecting bounding boxes around human faces. The “export for Coral” option generates a TFLite model that is optimized for running on Coral, a hardware platform specifically designed for edge computing, including mobile devices. The TFLite model is also compatible with iOS-based mobile phone applications, making it easy to integrate into the company's app.
upvoted 1 times
tavva_prudhvi
1 year, 7 months ago
While Coral can be used to optimize machine learning models for inference on edge devices, it's not the best option for an iOS-based mobile phone application.
upvoted 1 times
...
...
shankalman717
1 year, 8 months ago
Selected Answer: B
Option A, using AutoML Vision and exporting for Core ML, is also a viable option. Core ML is Apple's machine learning framework that is optimized for iOS-based devices. However, using this option would require more development effort to integrate the Core ML model into the app. Option C, using AutoML Vision and exporting for TensorFlow.js, is not the best option for this scenario since it is optimized for running on web browsers, not mobile devices. Option D, training a custom TensorFlow model and converting it to TFLite, would require significant development effort and time compared to using AutoML Vision. AutoML Vision provides a simple and efficient way to train and deploy machine learning models without requiring expertise in machine learning.
upvoted 1 times
tavva_prudhvi
1 year, 7 months ago
Excellent reasoning for C,D but Core ML is Apple's machine learning framework that is optimized for iOS-based devices, and exporting the model to Core ML format can help minimize inference time on mobile devices.
upvoted 1 times
...
...
enghabeth
1 year, 9 months ago
Selected Answer: D
https://www.tensorflow.org/lite https://medium.com/the-ai-team/step-into-on-device-inference-with-tensorflow-lite-a47242ba9130
upvoted 1 times
tavva_prudhvi
1 year, 7 months ago
Its wrong, While TFLite is a mobile-optimized version of TensorFlow, it requires more code development than using AutoML Vision and exporting for Core ML. Therefore, it's not the best option for minimizing code development time.
upvoted 1 times
...
...
ares81
1 year, 10 months ago
Selected Answer: A
I correct myself: it's A!
upvoted 1 times
...
egdiaa
1 year, 10 months ago
A indeed as described here: https://cloud.google.com/vision/automl/docs/export-edge
upvoted 1 times
...
hiromi
1 year, 10 months ago
Selected Answer: A
A "You want to minimize code development" -> AutoML - https://cloud.google.com/vision/automl/docs/tflite-coreml-ios-tutorial - https://cloud.google.com/vertex-ai/docs/training-overview#image
upvoted 2 times
...
mil_spyro
1 year, 10 months ago
Selected Answer: D
TensorFlow Lite is a lightweight version of TensorFlow that is optimized for mobile and embedded devices, making it an ideal choice for use in an iOS-based mobile phone application.
upvoted 2 times
...
ares81
1 year, 11 months ago
Selected Answer: D
I find no answer is 100% right, but D seems closer to the truth.
upvoted 1 times
ares81
1 year, 10 months ago
It's A.
upvoted 1 times
...
...

Topic 1 Question 93

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 93 discussion

You have been asked to build a model using a dataset that is stored in a medium-sized (~10 GB) BigQuery table. You need to quickly determine whether this data is suitable for model development. You want to create a one-time report that includes both informative visualizations of data distributions and more sophisticated statistical analyses to share with other ML engineers on your team. You require maximum flexibility to create your report. What should you do?

  • A. Use Vertex AI Workbench user-managed notebooks to generate the report.
  • B. Use the Google Data Studio to create the report.
  • C. Use the output from TensorFlow Data Validation on Dataflow to generate the report.
  • D. Use Dataprep to create the report.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
dija123
1 year, 4 months ago
Selected Answer: A
It is a data science request that could be ended on Jupiter notebook
upvoted 2 times
...
gscharly
1 year, 7 months ago
Selected Answer: A
More Flexbility
upvoted 2 times
...
SubbuJV
1 year, 9 months ago
Selected Answer: A
More Flexbility
upvoted 1 times
...
Mickey321
1 year, 12 months ago
Selected Answer: A
Max flexibility
upvoted 2 times
...
Krish6488
1 year, 12 months ago
Selected Answer: A
Looker studio is good too but it does not give the same depth in statistical analysis of the data as using matplotlib, seaborn etc gives on a notebook. So Jupyterlab notebook a.k.a Vertex AI workbench for me
upvoted 3 times
...
MCorsetti
2 years ago
Selected Answer: A
A as it is a one off report with maximum flexibility. Dont need a dashboard unless being reused
upvoted 1 times
...
lalala_meow
2 years, 1 month ago
Selected Answer: A
A for more sophisticated statistical analyses and maximum flexibility
upvoted 2 times
...
andresvelasco
2 years, 2 months ago
Selected Answer: A
A (AI workbench): "sophisticated"
upvoted 1 times
...
NickHapton
2 years, 4 months ago
1. one- time 2. flexibility go for A
upvoted 2 times
...
SamuelTsch
2 years, 4 months ago
Selected Answer: A
went with A, because of max. flexibility
upvoted 1 times
...
PST21
2 years, 4 months ago
Correct Answer A . While Google Data Studio (Option B) is a powerful data visualization and reporting tool, it might not provide the same level of flexibility and sophistication for statistical analyses compared to a notebook environment.
upvoted 2 times
...
CloudKida
2 years, 6 months ago
Selected Answer: C
TensorFlow Data Validation(TFDV) can compute descriptive statistics that provide a quick overview of the data in terms of the features that are present and the shapes of their value distributions. Tools such as Facets Overview can provide a succinct visualization of these statistics for easy browsing.
upvoted 3 times
...
lucaluca1982
2 years, 6 months ago
Selected Answer: A
A. Flexibility is the key.
upvoted 1 times
...
frangm23
2 years, 6 months ago
Selected Answer: B
I think has to be B. One of the keys is that it says quickly and BQ makes it very easy to export the query into Looker Studio. The other one is that there's maximum flexibility within the needs for this case (informative visualizations + statistical analysis), as we can develop and write custom formulas. A feels like overkill to use a Deep Learning VM Image to only describe data and perform some analysis. C also feels overkill to start developping a neural net for that. D although you may use Dataprep for this, it is less suited than A
upvoted 2 times
...
kucuk_kagan
2 years, 7 months ago
Selected Answer: A
A seçeneğini öneriyorum çünkü Vertex AI Workbench kullanıcı yönetimli not defterleri (user-managed notebooks), BigQuery tablosundaki verilerin analiz edilmesi ve görselleştirilmesi için daha fazla esneklik ve özelleştirme sağlar. Python kütüphaneleri (pandas, matplotlib, seaborn vb.) kullanarak, veri dağılımlarının görselleştirmelerini oluşturabilir ve daha karmaşık istatistiksel analizler gerçekleştirebilirsiniz.
upvoted 1 times
...
JamesDoe
2 years, 7 months ago
Selected Answer: A
I think it's A.One time report containing real datasets STATISTICAL measurements to tell if the data is suitable for model development. Target audience is also other ML engineers. Getting a whole report of exactly this with TFDV/Facets is like two lines of code: https://www.tensorflow.org/tfx/data_validation/get_started A similar data studio report for this would take lots of time and work, and there would be no benefit from reuseability since task was a one-time job.
upvoted 2 times
JamesDoe
2 years, 7 months ago
Depending on your definition of "You require maximum flexibility to create your report.", it could very well be B too.
upvoted 1 times
...
...
hghdh5454
2 years, 7 months ago
Selected Answer: A
A. Use Vertex AI Workbench user-managed notebooks to generate the report. By using Vertex AI Workbench user-managed notebooks, you can create a one-time report that includes both informative visualizations and sophisticated statistical analyses. The notebooks provide maximum flexibility for data analysis, as they allow you to use a wide range of libraries and tools to create visualizations, perform statistical tests, and share your findings with your team. You can easily connect to the BigQuery table from the notebook and perform the necessary data exploration and analysis.
upvoted 1 times
...

Topic 1 Question 94

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 94 discussion

You work on an operations team at an international company that manages a large fleet of on-premises servers located in few data centers around the world. Your team collects monitoring data from the servers, including CPU/memory consumption. When an incident occurs on a server, your team is responsible for fixing it. Incident data has not been properly labeled yet. Your management team wants you to build a predictive maintenance solution that uses monitoring data from the VMs to detect potential failures and then alerts the service desk team. What should you do first?

  • A. Train a time-series model to predict the machines’ performance values. Configure an alert if a machine’s actual performance values significantly differ from the predicted performance values.
  • B. Implement a simple heuristic (e.g., based on z-score) to label the machines’ historical performance data. Train a model to predict anomalies based on this labeled dataset.
  • C. Develop a simple heuristic (e.g., based on z-score) to label the machines’ historical performance data. Test this heuristic in a production environment.
  • D. Hire a team of qualified analysts to review and label the machines’ historical performance data. Train a model based on this manually labeled dataset.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
mil_spyro
Highly Voted 2 years, 10 months ago
Selected Answer: C
I would go for C, it is important to have a clear understanding of what constitutes a potential failure and how to detect it. A heuristic based on z-scores, for example, can be used to flag instances where the performance values of a machine significantly differ from its historical baseline.
upvoted 10 times
...
pico
Highly Voted 2 years, 1 month ago
Selected Answer: B
NOT C: when you have tested something directly in production?? Option B involves labeling historical data using heuristics, which can be a practical and quick way to get started.
upvoted 7 times
...
mouthwash
Most Recent 10 months, 2 weeks ago
Selected Answer: B
Since when is developing and testing allowed in prod? Answer is B
upvoted 4 times
...
rajshiv
11 months, 1 week ago
Selected Answer: B
Option B - because it allows you to efficiently label the data using a heuristic approach (e.g., z-score), and then train an anomaly detection model on that labeled data.
upvoted 2 times
...
Pau1234
11 months, 1 week ago
Selected Answer: B
c -> no testing in prod. could lead to risks. Hence B.
upvoted 3 times
...
AB_C
11 months, 2 weeks ago
Selected Answer: B
Why not C - Heuristic in production without a model: Directly deploying a heuristic in a production environment without testing it within a model can lead to many false positives and alert fatigue for the service desk team.
upvoted 2 times
...
razmik
2 years, 4 months ago
Selected Answer: C
Vote for C Reference: Rule #1: Don’t be afraid to launch a product without machine learning. https://developers.google.com/machine-learning/guides/rules-of-ml#before_machine_learning
upvoted 1 times
...
julliet
2 years, 4 months ago
Selected Answer: C
simple solution goes first, more sophisticated one -- after
upvoted 2 times
...
M25
2 years, 6 months ago
Selected Answer: C
Went with C
upvoted 2 times
...
TNT87
2 years, 6 months ago
Answer C Same as Question number 139
upvoted 2 times
...
studybrew
2 years, 7 months ago
What’s the difference between B and C?
upvoted 3 times
julliet
2 years, 5 months ago
in B you are labeling with heuristics and still develop a model in C you follow the ML-rules to adopt simple solution first and later decide if, how and where you need more sophisticated model
upvoted 2 times
...
...
tavva_prudhvi
2 years, 7 months ago
Selected Answer: C
This is the best option for this scenario because it's quick and inexpensive, and it can provide a baseline for labeling the historical performance data. Once we have labeled data, we can train a predictive maintenance model to detect potential failures and alert the service desk team.
upvoted 1 times
...
osaka_monkey
2 years, 8 months ago
why not D ?
upvoted 1 times
tavva_prudhvi
2 years, 7 months ago
While this approach may result in accurate labeling of the historical performance data, it can be time-consuming and expensive.
upvoted 1 times
...
...
John_Pongthorn
2 years, 9 months ago
Selected Answer: C
https://www.geeksforgeeks.org/z-score-for-outlier-detection-python/
upvoted 1 times
...
hiromi
2 years, 10 months ago
Selected Answer: B
I vote for B - https://developers.google.com/machine-learning/guides/rules-of-ml
upvoted 3 times
hiromi
2 years, 10 months ago
Sorry, I think C is the answer
upvoted 4 times
jamesking1103
2 years, 10 months ago
C. we need detect potential failures
upvoted 1 times
guilhermebutzke
2 years, 8 months ago
Why not B? The team wants to create a model to predict failure. So, the z-score is used to label the failure scenario, for then to use this to build a prediction model.
upvoted 2 times
tavva_prudhvi
2 years, 7 months ago
While this approach may work in some cases, it's not guaranteed to work well in this scenario because we don't know the nature of the anomalies that we want to detect. Therefore, it may be difficult to come up with a heuristic that can accurately label the historical performance data.
upvoted 2 times
evanfebrianto
2 years, 5 months ago
But testing the heuristic in a production environment without training a model could be risky and lead to false alarms or misses.
upvoted 1 times
...
...
...
...
...
...
ares81
2 years, 11 months ago
Selected Answer: A
This is really tricky, but it could be A.
upvoted 3 times
ares81
2 years, 10 months ago
Thinking about it, it should be C.
upvoted 1 times
...
...

Topic 1 Question 95

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 95 discussion

You are developing an ML model that uses sliced frames from video feed and creates bounding boxes around specific objects. You want to automate the following steps in your training pipeline: ingestion and preprocessing of data in Cloud Storage, followed by training and hyperparameter tuning of the object model using Vertex AI jobs, and finally deploying the model to an endpoint. You want to orchestrate the entire pipeline with minimal cluster management. What approach should you use?

  • A. Use Kubeflow Pipelines on Google Kubernetes Engine.
  • B. Use Vertex AI Pipelines with TensorFlow Extended (TFX) SDK.
  • C. Use Vertex AI Pipelines with Kubeflow Pipelines SDK.
  • D. Use Cloud Composer for the orchestration.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
qaz09
Highly Voted 2 years, 9 months ago
From: https://cloud.google.com/vertex-ai/docs/pipelines/build-pipeline#sdk "1. If you use TensorFlow in an ML workflow that processes terabytes of structured data or text data, we recommend that you build your pipeline using TFX. To learn more about building a TFX pipeline, follow the TFX getting started tutorials. To learn more about using Vertex AI Pipelines to run a TFX pipeline, follow the TFX on Google Cloud tutorials. 2. For other use cases, we recommend that you build your pipeline using the Kubeflow Pipelines SDK. By building a pipeline with the Kubeflow Pipelines SDK, you can implement your workflow by building custom components or reusing prebuilt components, such as the Google Cloud Pipeline Components. Google Cloud Pipeline Components make it easier to use Vertex AI services like AutoML in your pipeline." So I guess since it is image processing, it should be Kubeflow - answer C (TFX is for structured or text data).
upvoted 13 times
baimus
1 year, 2 months ago
TFX absolutely does support things other than structured and text datasets.
upvoted 2 times
5091a99
8 months, 1 week ago
Agreed. The Answer is TFX. This question is possibly old or wrong. TFX supports image data, and Vertex pipelines automates cluster management which was a critical priority.
upvoted 1 times
...
...
...
Ankit267
Most Recent 10 months, 2 weeks ago
Selected Answer: C
D is can be or not based on the data, C is 100% irrespective of data, therefore C
upvoted 1 times
...
AB_C
11 months, 2 weeks ago
Selected Answer: B
Vertex AI Pipelines: This is Google Cloud's managed service for orchestrating ML workflows. It handles the execution of your pipeline steps, manages dependencies, and provides monitoring and logging capabilities. Crucially, it minimizes cluster management, which is a key requirement for you. TensorFlow Extended (TFX) SDK: TFX is a powerful SDK specifically designed for building production-ready ML pipelines. It provides pre-built components for common tasks like data ingestion, preprocessing, training, evaluation, and deployment, making it well-suited for your video frame processing and object detection pipeline. Integration with Vertex AI: TFX components are designed to work seamlessly with Vertex AI services, including training jobs and endpoints. This ensures smooth transitions between pipeline steps and simplifies model deployment.
upvoted 3 times
...
baimus
1 year, 2 months ago
Selected Answer: B
This question is designed to be TFX. It would be a weird thing to do to say "Vertex pipelines with kubeflow SDK" because that's just one of the ways to implement stuff in vertex pipelines, which it doesn't normally specify. TFX adds the things in the question on top of the functionality of Vertex.
upvoted 3 times
...
San1111111111
1 year, 3 months ago
Selected Answer: B
minimal cluster management should rule out option c. why has everyone chosen that!it should be b
upvoted 3 times
...
pawan94
1 year, 6 months ago
You have to understand the ML lifecycle and difference between TFX and KFP better here. For ML end to end life cycle TFX is a better option, you can ensure Drift Detection/ Train Serve SKew with TFDV, you can easily perform serving with Tf.Serve and easily integrate TFX with Vertex AI pipeline which runs serverless. All these features are not directly available/managed in KFP ( as its user centric and user managed library). So I would go here with B.
upvoted 4 times
...
pinimichele01
1 year, 6 months ago
Selected Answer: C
If you use TensorFlow in an ML workflow that processes terabytes of structured data or text data, should use TFX. For other use cases, Kubeflow. Link: https://cloud.google.com/vertex-ai/docs/pipelines/build-pipelin
upvoted 2 times
...
Ulule
1 year, 8 months ago
Selected Answer: B
Overall, using Vertex AI Pipelines with TensorFlow Extended (TFX) SDK provides a comprehensive and managed solution for handling video feed data in an ML pipeline, while minimizing the need for manual infrastructure management and maximizing scalability and efficiency.
upvoted 2 times
...
vale_76_na_xxx
1 year, 11 months ago
I vote for be. the question stated that the minumumn clustering management is required, and I found this on the google study guide" Vertex AI Pipelines automatically provisions underlying infrastructure and managed it for you"
upvoted 1 times
...
Mickey321
1 year, 12 months ago
Selected Answer: B
minimal managment
upvoted 1 times
...
M25
2 years, 6 months ago
Selected Answer: C
Went with C
upvoted 1 times
...
tavva_prudhvi
2 years, 7 months ago
Selected Answer: C
Vertex AI Pipelines with Kubeflow Pipelines SDK provides a high-level interface for building end-to-end machine learning pipelines. This approach allows for easy integration with Google Cloud services, including Cloud Storage for data ingestion and preprocessing, Vertex AI for training and hyperparameter tuning, and deployment to an endpoint. The Kubeflow Pipelines SDK also allows for easy orchestration of the entire pipeline, minimizing cluster management.
upvoted 2 times
...
neochaotic
2 years, 7 months ago
Answer is C. If you use TensorFlow in an ML workflow that processes terabytes of structured data or text data, should use TFX. For other use cases, Kubeflow. Link: https://cloud.google.com/vertex-ai/docs/pipelines/build-pipeline
upvoted 2 times
...
TNT87
2 years, 8 months ago
Selected Answer: C
Answer C... https://cloud.google.com/architecture/ml-on-gcp-best-practices#use-vertex-pipelines
upvoted 2 times
TNT87
2 years, 6 months ago
https://cloud.google.com/architecture/ml-on-gcp-best-practices#use-kubeflow-pipelines-sdk-for-flexible-pipeline-construction
upvoted 2 times
...
...
John_Pongthorn
2 years, 9 months ago
Google want you to use core native service Pipeline, Don't overthink but , need to think it over. The anwser is in https://cloud.google.com/architecture/architecture-for-mlops-using-tfx-kubeflow-pipelines-and-cloud-build https://cloud.google.com/vertex-ai/docs/pipelines
upvoted 2 times
...
zeic
2 years, 10 months ago
Selected Answer: B
" You want to orchestrate the entire pipeline with minimal cluster management" because of that it cant be answer c i vote for b, becausse there is no cluster management with vertex ai
upvoted 3 times
TNT87
2 years, 6 months ago
nope, not correct
upvoted 1 times
...
...
hiromi
2 years, 10 months ago
Selected Answer: C
C "If you are using other frameworks, we recommend using Kubeflow Pipeline, which is very flexible and allows you to use simple code to construct pipelines. Kubeflow Pipeline also provides Google Cloud pipeline components such as Vertex AI AutoML." (Journey to Become a Google Cloud Machine Learning Engineer: Build the mind and hand of a Google Certified ML professional)
upvoted 4 times
...

Topic 1 Question 96

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 96 discussion

You are training an object detection machine learning model on a dataset that consists of three million X-ray images, each roughly 2 GB in size. You are using Vertex AI Training to run a custom training application on a Compute Engine instance with 32-cores, 128 GB of RAM, and 1 NVIDIA P100 GPU. You notice that model training is taking a very long time. You want to decrease training time without sacrificing model performance. What should you do?

  • A. Increase the instance memory to 512 GB and increase the batch size.
  • B. Replace the NVIDIA P100 GPU with a v3-32 TPU in the training job.
  • C. Enable early stopping in your Vertex AI Training job.
  • D. Use the tf.distribute.Strategy API and run a distributed training job.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
smarques
Highly Voted 2 years, 9 months ago
Selected Answer: C
I would say C. The question asks about time, so the option "early stopping" looks fine because it will no impact the existent accuracy (it will maybe improve it). The tf.distribute.Strategy reading the TF docs says that it's used when you want to split training between GPUs, but the question says that we have a single GPU. Open to discuss. :)
upvoted 7 times
djo06
2 years, 4 months ago
tf.distribute.OneDeviceStrategy uses parallel training on one GPU
upvoted 2 times
...
...
phani49
Highly Voted 10 months, 3 weeks ago
Selected Answer: D
D. Use the tf.distribute.Strategy API and run a distributed training job: • Why it’s correct: • Distributed training splits the dataset and workload across multiple machines and GPUs/TPUs, dramatically reducing training time. • The tf.distribute.Strategy API supports both synchronous and asynchronous distributed training, allowing scaling across multiple GPUs or TPUs in Vertex AI. • It is specifically designed for handling large datasets and computationally intensive tasks. • Example Strategies: • MultiWorkerMirroredStrategy: For synchronous training on multiple machines with GPUs. • TPUStrategy: For training across multiple TPUs. • Scales horizontally, effectively handling massive datasets like the 3-million-image X-ray dataset.
upvoted 6 times
...
Begum
Most Recent 6 months, 1 week ago
Selected Answer: C
tf.distribute.Strategy currently does not support TensorFlow's partitioned variables (where a single variable is split across multiple devices) at this time. Hence leaving the option to move to TPU to accelerate the tranining.
upvoted 1 times
...
AB_C
11 months, 2 weeks ago
Selected Answer: D
D is thr right answer
upvoted 2 times
...
Th3N1c3Guy
1 year, 1 month ago
Selected Answer: B
since compute engine is being used, seems like GPU upgrade makes sense
upvoted 2 times
...
baimus
1 year, 2 months ago
Selected Answer: D
The difficulty of this question is it's pure ambiguity. Two of the answer DO change the hardware, so this is obviously an option. The distribute strategy is clearly the right choice (D) assuming we are allowed more hardware to distribute it over. People are saying "we cannot change the hardware so it's B", but B is a change of hardware to TPU anyway, which would require a code change, at which point D would be implemented anyway.
upvoted 4 times
...
MultiCloudIronMan
1 year, 2 months ago
Selected Answer: D
I have seen two or even 3 of this question and there are strong debates on the answer, I want to suggest D, because Yes, distributed training can work with your setup of 32 cores, 128 GB of RAM, and 1 NVIDIA P100 GPU. However, the efficiency and performance will depend on the specific framework and strategy you use. The important thing about this answer is that it does not affect quality
upvoted 4 times
...
Jason_Cloud_at
1 year, 2 months ago
Selected Answer: B
in the question it says 3 Million xrays each with 2 GB , it will round upto 6M in size, TPU are exactly designed to accelerate ML tasks and it does massive parallelism, so i would go with B , i would directly omit A , C coz it is more about preventing and not directly aimed at reducing downtime, D is viable solution but comapring with B it is not.
upvoted 2 times
...
dija123
1 year, 4 months ago
Selected Answer: B
Agree with B
upvoted 2 times
...
inc_dev_ml_001
1 year, 6 months ago
Selected Answer: B
I would say B: A. Increse memory doesn't mean necessary a speed up of the process, it's not a batch-size problem B. It seems a image -> Tensorflow situation. So transforming image into tensors means that a TPU works better and maybe faster C. It's not a overfitting problem D. Same here, it's not a memory or input-size problem
upvoted 4 times
...
pinimichele01
1 year, 6 months ago
https://www.tensorflow.org/guide/distributed_training#onedevicestrategy
upvoted 1 times
pinimichele01
1 year, 6 months ago
https://www.tensorflow.org/guide/distributed_training#onedevicestrategy -> D
upvoted 1 times
...
...
Werner123
1 year, 8 months ago
Selected Answer: D
In my eyes the only solution is distributed training. 3 000 000 x 2GB = 6 Petabytes worth of data. No single device will get you there.
upvoted 4 times
...
ludovikush
1 year, 8 months ago
Selected Answer: B
Agree with JamesDoes
upvoted 2 times
...
Mickey321
1 year, 12 months ago
Selected Answer: B
B as it have only one GPU hence in D distributed not efficient
upvoted 4 times
...
pico
1 year, 12 months ago
f the question didn't specify the framework used, and you want to choose an option that is more framework-agnostic, it's important to consider the available options. Given the context and the need for a framework-agnostic approach, you might consider a combination of options A and D. Increasing instance memory and batch size can still be beneficial, and if you're using a deep learning framework that supports distributed training (like TensorFlow or PyTorch), implementing distributed training (Option D) can further accelerate the process.
upvoted 1 times
...
Krish6488
1 year, 12 months ago
Selected Answer: B
I would go with B as v3-32 TPU offers much more computational power than a single P100 GPU, and this upgrade should provide a substantial decrease in training time. Also tf.distributestrategy is good to perform distreibuted training on multiple GPUs or TPUs but the current setup has just one GPU which makes it the second best option provided the architecture uses multiple GPUs. Increase in memory may allow large batch size but wont address the fundamental problem which is over utilised GPU Early stopping is good for avoiding overfitting when model already starts performing at its best. Its good to reduce overall training time but wont improve the training speed
upvoted 5 times
...
pico
2 years, 1 month ago
Selected Answer: B
Given the options and the goal of decreasing training time, options B (using TPUs) and D (distributed training) are the most effective ways to achieve this goal C. Enable early stopping in your Vertex AI Training job: Early stopping is a technique that can help save training time by monitoring a validation metric and stopping the training process when the metric stops improving. While it can help in terms of stopping unnecessary training runs, it may not provide as substantial a speedup as other options.
upvoted 3 times
tavva_prudhvi
2 years ago
TPUs (Tensor Processing Units) are Google's custom-developed application-specific integrated circuits (ASICs) used to accelerate machine learning workloads. They are often faster than GPUs for specific types of computations. However, not all models or training pipelines will benefit from TPUs, and they might require code modification to fully utilize the TPU capabilities.
upvoted 1 times
...
...

Topic 1 Question 97

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 97 discussion

You are a data scientist at an industrial equipment manufacturing company. You are developing a regression model to estimate the power consumption in the company’s manufacturing plants based on sensor data collected from all of the plants. The sensors collect tens of millions of records every day. You need to schedule daily training runs for your model that use all the data collected up to the current date. You want your model to scale smoothly and require minimal development work. What should you do?

  • A. Train a regression model using AutoML Tables.
  • B. Develop a custom TensorFlow regression model, and optimize it using Vertex AI Training.
  • C. Develop a custom scikit-learn regression model, and optimize it using Vertex AI Training.
  • D. Develop a regression model using BigQuery ML.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
niketd
Highly Voted 2 years, 7 months ago
Selected Answer: D
The key is to understand the amount of data that needs to be used for training - the sensor collects tens of millions of records every day and the model needs to use all the data up to the current date. There is a limitation for AutoML is 100M rows -> https://cloud.google.com/vertex-ai/docs/tabular-data/classification-regression/prepare-data
upvoted 18 times
...
OpenKnowledge
Most Recent 3 weeks, 6 days ago
Selected Answer: D
AutoML has limitation on the number of rows it can use. BQML is a low-code ML solution
upvoted 1 times
...
Laur_C
11 months ago
Selected Answer: A
Old question - quota for AutoML (now Vertex AI) is Between 1,000 and 200,000,000 rows so should be able to handle well. Plus "minimal development work" is usually a key word for AutoML (Vertex Ai)
upvoted 3 times
Laur_C
11 months ago
AutoML modal limits: https://cloud.google.com/vertex-ai/docs/quotas#tabular_1
upvoted 2 times
...
...
pinimichele01
1 year, 6 months ago
Selected Answer: D
There is a limitation for AutoML is 100M rows -> https://cloud.google.com/vertex-ai/docs/tabular-data/classification-regression/prepare-data
upvoted 1 times
...
vale_76_na_xxx
1 year, 11 months ago
I go for A
upvoted 2 times
pinimichele01
1 year, 6 months ago
There is a limitation for AutoML is 100M rows -> https://cloud.google.com/vertex-ai/docs/tabular-data/classification-regression/prepare-data
upvoted 1 times
...
...
Mickey321
1 year, 12 months ago
Selected Answer: A
Either A or D . Since not stated where is sensor data stored . hence go for A
upvoted 2 times
...
PST21
2 years, 4 months ago
Ans D. BigQuery ML allows you to schedule daily training runs by incorporating the latest data collected up to the current date. By specifying the appropriate SQL query, you can include all the relevant data in the training process, ensuring that your model is updated regularly.
upvoted 2 times
maukaba
2 years ago
it says "use all the data collected up to the current date" not a just a selection of "relevant" (?!) data
upvoted 1 times
...
...
ggwp1999
2 years, 6 months ago
Selected Answer: A
I would go with A because it states that it requires minimal development work. Not sure tho, correct me if I’m wrong
upvoted 4 times
...
M25
2 years, 6 months ago
Selected Answer: D
Went with D
upvoted 1 times
...
JamesDoe
2 years, 7 months ago
Selected Answer: A
Old question, the quotas were removed when they moved AutoML into VertexAI. https://cloud.google.com/vertex-ai/docs/quotas#model_quotas#tabular
upvoted 3 times
...
Yajnas_arpohc
2 years, 7 months ago
Would go w A given the specifics mentioned in question. BigQuery is an unnecessary distraction IMO (e.g. why would we assume BigQuery and not BigTable!)
upvoted 2 times
...
TNT87
2 years, 8 months ago
Selected Answer: D
Answer D https://cloud.google.com/blog/products/data-analytics/automl-tables-now-generally-available-bigquery-ml This legacy version of AutoML Tables is deprecated and will no longer be available on Google Cloud after January 23, 2024. All the functionality of legacy AutoML Tables and new features are available on the Vertex AI platform. See Migrate to Vertex AI to learn how to migrate your resources.
upvoted 2 times
...
FherRO
2 years, 8 months ago
Selected Answer: A
You require minimal development work and the question doesn't mention if your data is stored in BQ
upvoted 1 times
...
Ade_jr
2 years, 10 months ago
Selected Answer: D
Answer is D, AutoML has 200M rows as limits
upvoted 3 times
...
ares81
2 years, 10 months ago
Selected Answer: A
A and D seem both good, but A works better, for me.
upvoted 1 times
...
mymy9418
2 years, 10 months ago
Selected Answer: A
But BQML also has limits on training data https://cloud.google.com/bigquery-ml/quotas
upvoted 2 times
...
hiromi
2 years, 10 months ago
Selected Answer: D
Vote for D A dosen't work because AutoML has limits on training data - https://www.examtopics.com/exams/google/professional-machine-learning-engineer/view/10/
upvoted 3 times
behzadsw
2 years, 10 months ago
Wrong. The limit is 200 M records. We have 10M records. see: https://cloud.google.com/automl-tables/docs/quotas
upvoted 1 times
adarifian
2 years, 7 months ago
it's more than 10M. the training needs to use all the data collected up to the current date
upvoted 2 times
...
...
...

Topic 1 Question 98

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 98 discussion

You built a custom ML model using scikit-learn. Training time is taking longer than expected. You decide to migrate your model to Vertex AI Training, and you want to improve the model’s training time. What should you try out first?

  • A. Migrate your model to TensorFlow, and train it using Vertex AI Training.
  • B. Train your model in a distributed mode using multiple Compute Engine VMs.
  • C. Train your model with DLVM images on Vertex AI, and ensure that your code utilizes NumPy and SciPy internal methods whenever possible.
  • D. Train your model using Vertex AI Training with GPUs.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Fer660
2 months, 2 weeks ago
Selected Answer: D
Not A: We should not be required to port our model. Not B: Part of the benefit of moving to Vertex AI should be in not managing VMs directly Not C: "ensure that your code utilizes NumPy and SciPy internal methods whenever possible" is the tell. So what if I am using something else? Then the solution does not work? That can't be a robust approach. D: Seems reasonable to expect a speed-up from GPU
upvoted 2 times
Fer660
2 months, 2 weeks ago
I was wrong. GPUs do not help sk-learn. Answer is C after all.
upvoted 1 times
...
...
bc3f222
8 months ago
Selected Answer: C
scikit-learn so Option D no good, Option C, training with DLVM images on Vertex AI and optimizing code with NumPy and SciPy, would be more appropriate in your scenario.
upvoted 1 times
...
desertlotus1211
8 months, 3 weeks ago
Selected Answer: B
Scikit-learn models are typically CPU-based, and many of their algorithms can benefit from parallelization when the workload is distributed
upvoted 1 times
...
rajshiv
11 months, 1 week ago
Selected Answer: D
DLVM are typically designed for deep learning workloads and do not provide as much benefit for scikit-learn training. Utilizing GPUs for acceleration is best, as scikit-learn can benefit from GPU-accelerated libraries.
upvoted 3 times
...
FireAtMe
11 months, 2 weeks ago
Selected Answer: A
D is wrong. Not every model in scikit-learn need GPUs or gradients.
upvoted 1 times
FireAtMe
11 months, 2 weeks ago
C , I chose the wrong number
upvoted 1 times
...
...
AB_C
11 months, 2 weeks ago
Selected Answer: D
GPU Acceleration: Scikit-learn can leverage GPUs for certain algorithms, especially those involving matrix operations, which are common in many machine learning models. GPUs excel at parallel processing, significantly reducing training time compared to CPUs. Vertex AI Training: Vertex AI Training makes it easy to use GPUs. You can specify the type and number of GPUs in your training job configuration, and Vertex AI handles the infrastructure setup. Minimal Code Changes: You might need to make minor adjustments to your code to ensure it utilizes the GPU, but generally, scikit-learn integrates well with GPUs.
upvoted 2 times
...
pico
1 year, 12 months ago
Selected Answer: D
Options B and C may also be relevant in certain scenarios, but they are generally more involved and might require additional considerations. Option B can be effective for large-scale training tasks, but it might add complexity and overhead. Option C could be helpful, but the impact on training time might not be as immediate and substantial as using GPUs.
upvoted 2 times
...
pico
2 years, 1 month ago
Selected Answer: D
D: Training your model with GPUs can provide a substantial speedup, especially for deep learning models or models that require a lot of computation. This option is likely to have a significant impact on training time. NOT C: While optimizing code can help improve training time to some extent, it may not provide as significant a speedup as the other options. However, it's still a good practice to optimize your code.
upvoted 1 times
...
andresvelasco
2 years, 2 months ago
Selected Answer: C
I dont think scikit-learn would support GPU or distribution, so based on "What should you try out first?" I think > C. Train your model with DLVM images on Vertex AI, and ensure that your code utilizes NumPy and SciPy internal methods whenever possible.
upvoted 3 times
...
blobfishtu
2 years, 4 months ago
why not B? Vertex AI provides the ability to distribute training tasks across multiple Compute Engine VMs, which can parallelize the workload and significantly reduce the training time for large datasets and complex models.
upvoted 4 times
...
PST21
2 years, 4 months ago
Option D is not the optimal choice for a scikit-learn model since scikit-learn does not have native GPU support. Option C, training with DLVM images on Vertex AI and optimizing code with NumPy and SciPy, would be more appropriate in your scenario.
upvoted 2 times
...
PST21
2 years, 4 months ago
Ans - D. quickest improvement in training time with minimal modifications to your existing scikit-learn model, trying out Option D and training your model using Vertex AI Training with GPUs is the recommended first step.
upvoted 1 times
...
Scipione_
2 years, 5 months ago
Selected Answer: C
A) Migrate your model to TensorFlow, and train it using Vertex AI Training. Not the first thing to do. B) Train your model in a distributed mode using multiple Compute Engine VMs. Could be not easy and fast. D)Train your model using Vertex AI Training with GPUs sklearn does not support GPUs Also, most of scikit-learn assumes data is in NumPy arrays or SciPy sparse matrices of a single numeric dtype. I choose C as the correct answer.
upvoted 4 times
...
M25
2 years, 6 months ago
Selected Answer: C
Went with C
upvoted 1 times
...
TNT87
2 years, 6 months ago
Selected Answer: C
Answer C
upvoted 1 times
...
guilhermebutzke
2 years, 8 months ago
How about using sklearn's multi-core? Considering multiple jobs, could we choose item B? https://machinelearningmastery.com/multi-core-machine-learning-in-python/
upvoted 1 times
...
enghabeth
2 years, 9 months ago
Selected Answer: C
https://scikit-learn.org/stable/faq.html#will-you-add-gpu-support
upvoted 1 times
...

Topic 1 Question 99

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 99 discussion

You are an ML engineer at a travel company. You have been researching customers’ travel behavior for many years, and you have deployed models that predict customers’ vacation patterns. You have observed that customers’ vacation destinations vary based on seasonality and holidays; however, these seasonal variations are similar across years. You want to quickly and easily store and compare the model versions and performance statistics across years. What should you do?

  • A. Store the performance statistics in Cloud SQL. Query that database to compare the performance statistics across the model versions.
  • B. Create versions of your models for each season per year in Vertex AI. Compare the performance statistics across the models in the Evaluate tab of the Vertex AI UI.
  • C. Store the performance statistics of each pipeline run in Kubeflow under an experiment for each season per year. Compare the results across the experiments in the Kubeflow UI.
  • D. Store the performance statistics of each version of your models using seasons and years as events in Vertex ML Metadata. Compare the results across the slices.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Pau1234
11 months, 1 week ago
Selected Answer: B
https://cloud.google.com/vertex-ai/docs/evaluation/introduction
upvoted 3 times
...
pinimichele01
1 year, 6 months ago
Selected Answer: B
https://cloud.google.com/vertex-ai/docs/model-registry/versioning Model versioning lets you create multiple versions of the same model. With model versioning, you can organize your models in a way that helps navigate and understand which changes had what effect on the models. With Vertex AI Model Registry you can view your models and all of their versions in a single view. You can drill down into specific model versions and see exactly how they performed.
upvoted 1 times
...
gscharly
1 year, 6 months ago
Selected Answer: B
agree with pico
upvoted 1 times
...
Mickey321
1 year, 12 months ago
Selected Answer: B
either B or D so leaning towards B
upvoted 1 times
...
pico
2 years, 1 month ago
Selected Answer: B
Vertex AI provides a managed environment for machine learning, and creating model versions for each season per year is a structured way to organize and compare models. You can use the Evaluate tab to compare performance metrics easily. This approach is well-suited for the task.
upvoted 2 times
pico
2 years, 1 month ago
not D: Vertex ML Metadata is designed for tracking metadata and lineage in machine learning pipelines. While it can store model version information and performance statistics, it might not provide as straightforward a way to compare models across years and seasons as Vertex AI's model versioning and evaluation tools.
upvoted 1 times
...
...
andresvelasco
2 years, 2 months ago
Selected Answer: D
I absolutely do not master this topicm but I would say correct answer is D. It does not sound right to systematically create versions of a model beased on seasonality, if the model has not changed. "Events" in metadata sound right.
upvoted 2 times
...
PST21
2 years, 4 months ago
Ans D- With Vertex ML Metadata, you can store the performance statistics of each version of your models as events. You can associate these events with specific seasons and years, making it easy to organize and retrieve the data based on the relevant time periods. By storing performance statistics as events, you can capture the necessary information for comparing model versions across years.
upvoted 3 times
...
Voyager2
2 years, 5 months ago
Selected Answer: D
D. Store the performance statistics of each version of your models using seasons and years as events in Vertex ML Metadata. Compare the results across the slices. https://cloud.google.com/vertex-ai/docs/ml-metadata/analyzing#filtering Which versions of a trained model achieved a certain quality threshold?
upvoted 2 times
pico
2 years, 1 month ago
https://cloud.google.com/vertex-ai/docs/evaluation/using-model-evaluation#console
upvoted 1 times
...
...
M25
2 years, 6 months ago
Selected Answer: D
Went with D
upvoted 1 times
iskorini
2 years, 5 months ago
why choose D instead of B?
upvoted 1 times
...
...
CloudKida
2 years, 6 months ago
Selected Answer: B
https://cloud.google.com/vertex-ai/docs/model-registry/versioning Model versioning lets you create multiple versions of the same model. With model versioning, you can organize your models in a way that helps navigate and understand which changes had what effect on the models. With Vertex AI Model Registry you can view your models and all of their versions in a single view. You can drill down into specific model versions and see exactly how they performed.
upvoted 1 times
...
Yajnas_arpohc
2 years, 7 months ago
Selected Answer: B
You can compare evaluation results across different models, model versions, and evaluation jobs --> https://cloud.google.com/vertex-ai/docs/evaluation/using-model-evaluation Metadata mgmt has a very different purpose
upvoted 1 times
...
TNT87
2 years, 8 months ago
Selected Answer: D
Answer D
upvoted 1 times
...
hiromi
2 years, 10 months ago
Selected Answer: D
D - https://cloud.google.com/vertex-ai/docs/ml-metadata/introduction
upvoted 2 times
...
mil_spyro
2 years, 10 months ago
Selected Answer: D
Vote D. It is easy to compare via Vertex ML Metadata UI the performance statistics across the different slices and see how the model performance varies over time.
upvoted 3 times
...
mymy9418
2 years, 10 months ago
Selected Answer: D
i think it is D
upvoted 1 times
...

Topic 1 Question 100

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 100 discussion

You are an ML engineer at a manufacturing company. You need to build a model that identifies defects in products based on images of the product taken at the end of the assembly line. You want your model to preprocess the images with lower computation to quickly extract features of defects in products. Which approach should you use to build the model?

  • A. Reinforcement learning
  • B. Recommender system
  • C. Recurrent Neural Networks (RNN)
  • D. Convolutional Neural Networks (CNN)
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
bc3f222
8 months, 3 weeks ago
Selected Answer: D
If image then CNN, moreover other options not suitable for image problems, RNN is sequential so can be used for time series or as LSTM for text classification
upvoted 1 times
...
MultiCloudIronMan
1 year, 1 month ago
Selected Answer: D
CNN is commonly used for image classifications
upvoted 3 times
...
Scipione_
1 year, 11 months ago
Selected Answer: D
D for sure
upvoted 1 times
...
M25
2 years ago
Selected Answer: D
Went with D
upvoted 1 times
...
TNT87
2 years, 2 months ago
Selected Answer: D
Answer D
upvoted 1 times
...
FherRO
2 years, 2 months ago
Selected Answer: D
CNNs commonly used for image classification and recognition tasks.
upvoted 1 times
...
FherRO
2 years, 2 months ago
Selected Answer: D
CNN scenario
upvoted 1 times
...
enghabeth
2 years, 3 months ago
Selected Answer: D
best way
upvoted 1 times
...
hiromi
2 years, 4 months ago
Selected Answer: D
D CNN is good for images processing - https://developers.google.com/machine-learning/practica/image-classification/convolutional-neural-networks
upvoted 1 times
...
ares81
2 years, 4 months ago
Selected Answer: D
Obviously D.
upvoted 2 times
...

Topic 1 Question 101

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 101 discussion

You are developing an ML model intended to classify whether X-ray images indicate bone fracture risk. You have trained a ResNet architecture on Vertex AI using a TPU as an accelerator, however you are unsatisfied with the training time and memory usage. You want to quickly iterate your training code but make minimal changes to the code. You also want to minimize impact on the model’s accuracy. What should you do?

  • A. Reduce the number of layers in the model architecture.
  • B. Reduce the global batch size from 1024 to 256.
  • C. Reduce the dimensions of the images used in the model.
  • D. Configure your model to use bfloat16 instead of float32.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
mymy9418
Highly Voted 2 years, 4 months ago
i think should be D https://cloud.google.com/tpu/docs/bfloat16
upvoted 8 times
...
OpenKnowledge
Most Recent 1 month, 1 week ago
Selected Answer: D
All options except D will impact the model accuracy
upvoted 2 times
...
fitri001
1 year ago
Selected Answer: D
Configuring bfloat16 instead of float32 (D): This offers a good balance between speed, memory usage, and minimal code changes. Bfloat16 uses 16 bits per float value compared to 32 bits for float32. pen_spark expand_more This can significantly reduce memory usage while maintaining similar accuracy in many machine learning models, especially for image recognition tasks.expand_more It's a quick change with minimal impact on the code and potentially large gains in training speed.
upvoted 3 times
...
pinimichele01
1 year ago
Selected Answer: D
"the Google hardware team chose bfloat16 for Cloud TPUs to improve hardware efficiency while maintaining the ability to train deep learning models accurately, all with minimal switching costs from float32"
upvoted 3 times
...
pico
1 year, 8 months ago
Selected Answer: B
while reducing the global batch size (Option B) and configuring your model to use bfloat16 (Option D) are both valid options, reducing the global batch size is typically a safer and more straightforward choice to quickly iterate and make minimal changes to your code while still achieving reasonable model performance.
upvoted 1 times
pico
1 year, 8 months ago
Why not D: Numerical Precision: bfloat16 has a lower numerical precision compared to float32 Compatibility: Not all machine learning frameworks and libraries support bfloat16 natively. Hyperparameter Tuning: When switching to bfloat16, you may need to adjust hyperparameters, such as learning rates and gradient clipping thresholds, to accommodate the lower numerical precision Model Architecture: Some model architectures and layers may be more sensitive to reduced precision than others.
upvoted 1 times
tavva_prudhvi
1 year, 6 months ago
TPUs are optimized for operations with bfloat16 data types. By switching from float32 to bfloat16, you can benefit from the TPU's hardware acceleration capabilities, leading to faster computation and reduced memory usage without significant changes to your code. While bfloat16 offers a lower precision compared to float32, it maintains a similar dynamic range. This means that the reduction in numerical precision is unlikely to have a substantial impact on the accuracy of your model, especially in the context of image classification tasks like bone fracture risk assessment in X-rays. While reducing the batch size can decrease memory usage, it can also affect the model's convergence and accuracy. Additionally, TPUs are highly efficient with large batch sizes, so reducing the batch size might not fully leverage the TPU's capabilities.
upvoted 4 times
...
...
...
Voyager2
1 year, 11 months ago
Selected Answer: D
I think it should be D since they are using a TPU.https://cloud.google.com/tpu/docs/bfloat16
upvoted 2 times
...
M25
2 years ago
Selected Answer: D
Went with D
upvoted 1 times
...
tavva_prudhvi
2 years, 1 month ago
Selected Answer: D
https://cloud.google.com/tpu/docs/bfloat16
upvoted 1 times
...
TNT87
2 years, 2 months ago
Selected Answer: D
Answer D
upvoted 2 times
...
ailiba
2 years, 2 months ago
"the Google hardware team chose bfloat16 for Cloud TPUs to improve hardware efficiency while maintaining the ability to train deep learning models accurately, all with minimal switching costs from float32" so since its already trained on TPU, D maybe has no effect?
upvoted 3 times
...
John_Pongthorn
2 years, 3 months ago
Selected Answer: D
I go with D exactly, primarily. the rest don't make any sense at all
upvoted 2 times
...
ares81
2 years, 4 months ago
Selected Answer: D
It should be D.
upvoted 1 times
...
hiromi
2 years, 4 months ago
Selected Answer: D
D Agree with mymy9418
upvoted 2 times
...
mil_spyro
2 years, 4 months ago
Selected Answer: D
Agree with D
upvoted 1 times
...
ares81
2 years, 4 months ago
Selected Answer: B
It should be B.
upvoted 1 times
...

Topic 1 Question 102

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 102 discussion

You have successfully deployed to production a large and complex TensorFlow model trained on tabular data. You want to predict the lifetime value (LTV) field for each subscription stored in the BigQuery table named subscription. subscriptionPurchase in the project named my-fortune500-company-project.

You have organized all your training code, from preprocessing data from the BigQuery table up to deploying the validated model to the Vertex AI endpoint, into a TensorFlow Extended (TFX) pipeline. You want to prevent prediction drift, i.e., a situation when a feature data distribution in production changes significantly over time. What should you do?

  • A. Implement continuous retraining of the model daily using Vertex AI Pipelines.
  • B. Add a model monitoring job where 10% of incoming predictions are sampled 24 hours.
  • C. Add a model monitoring job where 90% of incoming predictions are sampled 24 hours.
  • D. Add a model monitoring job where 10% of incoming predictions are sampled every hour.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
dija123
1 month, 1 week ago
Selected Answer: D
24-hour window is too infrequent for proactively preventing drift's negative effects.
upvoted 1 times
...
Antmal
3 months, 1 week ago
Selected Answer: B
Option B configuration represents the ideal balance for a high-stakes LTV model in a high-traffic environment: WHY 1. 10% sampling rate is highly cost-effective while providing more than enough data for statistically sound drift detection. 2. 24-hour monitoring window provides a stable, reliable signal by smoothing out normal diurnal variations in user behaviour, thus minimising false positives and alert fatigue.
upvoted 1 times
...
vini123
9 months, 1 week ago
Selected Answer: B
Subscription LTV data doesn’t change rapidly → Hourly checks (D) are unnecessary. Monitoring 10% of data per day (B) is sufficient → Detects drift while minimizing cost. Cost consideration → Hourly monitoring (D) increases expenses without significant added value for slow-changing data.
upvoted 4 times
...
f9bc58e
9 months, 3 weeks ago
Selected Answer: D
Sampling predictions every hour will enable detect drift more quickly compared to daily sampling and react earlier.
upvoted 2 times
...
phani49
10 months, 3 weeks ago
Selected Answer: D
Why D is correct: • Hourly monitoring ensures timely detection of prediction drift, which is critical in production systems. • Sampling 10% of predictions balances computational efficiency and detection accuracy. • Vertex AI model monitoring jobs support frequent sampling and provide detailed insights into feature distribution changes. A: Continuous retraining daily Daily retraining alone does not guarantee early detection of drift. Drift can happen and impact your predictions hours after your last retraining. Without monitoring, you might only discover the issue after a full day or more.
upvoted 4 times
...
f084277
12 months ago
Selected Answer: A
It says PREVENT with no other constraints.
upvoted 2 times
...
MultiCloudIronMan
1 year, 7 months ago
Selected Answer: B
You need to monitor it first and foremost to see if there is a drift and if there is then a measure can be devised. training every date is an over kill.
upvoted 4 times
...
pico
2 years, 1 month ago
Selected Answer: A
Continuous Retraining: Continuously retraining the model allows it to adapt to changes in the data distribution, helping to mitigate prediction drift. Daily retraining provides a good balance between staying up-to-date and avoiding excessive retraining. Options B, C, and D involve model monitoring but do not address the issue of keeping the model updated with the changing data distribution. Monitoring alone can help you detect drift, but it does not actively prevent it. Retraining the model is necessary to address drift effectively.
upvoted 3 times
Nish1729
1 year, 10 months ago
Follow me on X (twitter): @nbcodes for more useful tips. I think you're slightly missing the point, the answer should be B, let me explain why.. The whole point of this question is to come up with a PREVENTATIVE way of handling prediction drift so you need to find a way to DETECT the drift before it occurs, this is exactly what solution B does and ensures it's done in a way that is not too frequent i.e D and not too resource intensive with the large sample i.e C remember if sampling is done well you don't need 90% of the data to detect drift. Solution A suggests retraining every day which is a CRAZY proposal, why would you retrain every day even if you don't know if your data is drifting?? Huge waste of resources and time.
upvoted 2 times
...
maukaba
2 years ago
Option A can prevent drift prediction. All the other options can only detect. Therefore the correct answer is A unless it is possible to monitor drifts and then remediate without retrainings.
upvoted 2 times
...
...
M25
2 years, 6 months ago
Selected Answer: B
Went with B
upvoted 1 times
...
tavva_prudhvi
2 years, 7 months ago
Selected Answer: B
Continuous retraining (option A) is not necessarily the best solution for preventing prediction drift, as it can be time-consuming and expensive. Instead, monitoring the performance of the model in production is a better approach. Option B is a good choice because it samples a small percentage of incoming predictions and checks for any significant changes in the feature data distribution over a 24-hour period. This allows you to detect any drift and take appropriate action to address it before it affects the model's performance. Options C and D are less effective because they either sample too many or too few predictions and/or at too frequent intervals.
upvoted 4 times
andresvelasco
2 years, 2 months ago
I am just not sure why sampling too few (10%) is important. Is this a costly service?
upvoted 1 times
tavva_prudhvi
2 years ago
Model monitoring, especially at a large scale, can consume significant computational resources. Sampling a smaller percentage of predictions (like 10%) helps manage these resource demands and associated costs. The more predictions you sample, the more storage, computation, and network resources you'll need to analyze the data, potentially increasing the cost. In many cases, a 10% sample of the data can provide statistically significant insights into the model's performance and the presence of drift. It's a balancing act between getting enough data to make informed decisions and not overburdening the system. In some datasets, especially large ones, a lot of the data might be redundant or not particularly informative. Sampling a smaller fraction can help filter out noise and focus on the most relevant information.
upvoted 1 times
...
...
pico
1 year, 12 months ago
Neither B,C or D have a step to prevent the prediction drift. The question says: "you want to prevent prediction drift"
upvoted 2 times
...
...
TNT87
2 years, 8 months ago
Selected Answer: B
Answer B
upvoted 1 times
...
John_Pongthorn
2 years, 9 months ago
Selected Answer: B
B , I got it from Machine Learning in the Enterprise course for google partnet skillboost you can watch cafully on video "Model management using Vertex AI" I imply that it is default setting on typical case.
upvoted 3 times
...
behzadsw
2 years, 10 months ago
Selected Answer: D
Using 10% of hourly requests would yield a better distribution and faster feed back loop
upvoted 1 times
...
hargur
2 years, 10 months ago
I think it is B, we can say 10% to be a sample but not 90%
upvoted 2 times
...
mymy9418
2 years, 10 months ago
Selected Answer: B
I guess 10% of 24 hours should be good enough?
upvoted 3 times
...
hiromi
2 years, 10 months ago
Selected Answer: B
B (not sure) - https://cloud.google.com/vertex-ai/docs/model-monitoring/overview - https://cloud.google.com/vertex-ai/docs/model-monitoring/using-model-monitoring#drift-detection
upvoted 2 times
...

Topic 1 Question 103

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 103 discussion

You recently developed a deep learning model using Keras, and now you are experimenting with different training strategies. First, you trained the model using a single GPU, but the training process was too slow. Next, you distributed the training across 4 GPUs using tf.distribute.MirroredStrategy (with no other changes), but you did not observe a decrease in training time. What should you do?

  • A. Distribute the dataset with tf.distribute.Strategy.experimental_distribute_dataset
  • B. Create a custom training loop.
  • C. Use a TPU with tf.distribute.TPUStrategy.
  • D. Increase the batch size.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
egdiaa
Highly Voted 2 years, 10 months ago
Selected Answer: D
Ans D: Check this link https://www.tensorflow.org/guide/gpu_performance_analysis for details on how to Optimize the performance on the multi-GPU single host
upvoted 11 times
...
desertlotus1211
Most Recent 8 months, 2 weeks ago
Selected Answer: D
When using distributed training with tf.distribute.MirroredStrategy, each GPU processes a slice of the batch. If you keep the batch size constant, each GPU receives a smaller effective batch, which might not fully utilize the computational power of each device. Increasing the batch size allows each GPU to process more data in parallel, which can lead to improved training speed and better resource utilization without modifying your training loop or switching strategies
upvoted 4 times
...
rajshiv
11 months, 1 week ago
Selected Answer: A
I will go with A. By using tf.distribute.Strategy.experimental_distribute_dataset we can ensure that the dataset is effectively split across the GPUs, which will help fully utilize the GPUs and achieve faster training times. Increasing the batch size can improve training performance on GPUs by allowing them to process more data in parallel. However, if the dataset is not properly distributed across GPUs, simply increasing the batch size won't lead to improved training times. In fact, using a larger batch size can lead to memory bottlenecks if not handled correctly. The key here is to first ensure proper data distribution before tweaking batch size.
upvoted 1 times
...
AB_C
11 months, 2 weeks ago
Selected Answer: A
A is the right answer
upvoted 1 times
...
pinimichele01
1 year, 7 months ago
Selected Answer: D
when using tf.distribute.MirroredStrategy, TensorFlow automatically takes care of distributing the dataset across the available devices (GPUs in this case). To make sure that the data is efficiently distributed across the GPUs, you should increase the global batch size. This ensures that each GPU receives a larger batch of data to process, effectively utilizing the additional computational power. The global batch size is the sum of the batch sizes for all devices. For example, if you had a batch size of 64 for a single GPU, you would set the global batch size to 256 (64 * 4) when using 4 GPUs.
upvoted 4 times
...
pico
1 year, 12 months ago
Selected Answer: A
When you distribute the training across multiple GPUs using tf.distribute.MirroredStrategy, the training time may not decrease if the dataset loading and preprocessing become a bottleneck. In this case, option A, distributing the dataset with tf.distribute.Strategy.experimental_distribute_dataset, can help improve the performance.
upvoted 3 times
pico
1 year, 12 months ago
option D can be a reasonable step to try, but it's important to carefully monitor the training process, consider memory constraints, and assess the impact on model performance. It might be a good idea to try both option A (distributing the dataset) and option D (increasing the batch size) to see if there is any improvement in training time.
upvoted 1 times
...
...
PST21
2 years, 3 months ago
A. Distribute the dataset with tf.distribute.Strategy.experimental_distribute_dataset When you distribute the training across multiple GPUs using tf.distribute.MirroredStrategy, you need to make sure that the data is also distributed across the GPUs to fully utilize the computational power. By default, the tf.distribute.MirroredStrategy replicates the model and uses synchronous training, but it does not automatically distribute the dataset across the GPUs.
upvoted 1 times
tavva_prudhvi
2 years ago
You are right, However, when using tf.distribute.MirroredStrategy, TensorFlow automatically takes care of distributing the dataset across the available devices (GPUs in this case). To make sure that the data is efficiently distributed across the GPUs, you should increase the global batch size. This ensures that each GPU receives a larger batch of data to process, effectively utilizing the additional computational power. The global batch size is the sum of the batch sizes for all devices. For example, if you had a batch size of 64 for a single GPU, you would set the global batch size to 256 (64 * 4) when using 4 GPUs.
upvoted 1 times
...
...
CloudKida
2 years, 6 months ago
Selected Answer: D
When going from training with a single GPU to multiple GPUs on the same host, ideally you should experience the performance scaling with only the additional overhead of gradient communication and increased host thread utilization. Because of this overhead, you will not have an exact 2x speedup if you move from 1 to 2 GPUs. Try to maximize the batch size, which will lead to higher device utilization and amortize the costs of communication across multiple GPUs. Using the memory profiler helps get a sense of how close your program is to peak memory utilization. Note that while a higher batch size can affect convergence, this is usually outweighed by the performance benefits.
upvoted 2 times
...
M25
2 years, 6 months ago
Selected Answer: D
Went with D
upvoted 1 times
...
tavva_prudhvi
2 years, 7 months ago
Selected Answer: D
If distributing the training across multiple GPUs did not result in a decrease in training time, the issue may be related to the batch size being too small. When using multiple GPUs, each GPU gets a smaller portion of the batch size, which can lead to slower training times due to increased communication overhead. Therefore, increasing the batch size can help utilize the GPUs more efficiently and speed up training.
upvoted 3 times
...
TNT87
2 years, 8 months ago
Selected Answer: D
Answer D
upvoted 1 times
...
John_Pongthorn
2 years, 9 months ago
D: it is best https://www.tensorflow.org/guide/distributed_training#use_tfdistributestrategy_with_keras_modelfit Each epoch will then train faster as you add more GPUs. Typically, you would want to increase your batch size as you add more accelerators, C is rule out because of GPU A and B , as reading on https://www.tensorflow.org/guide/distributed_training#use_tfdistributestrategy_with_custom_training_loops To use custom loop , we have call If you are writing a custom training loop, you will need to call a few more methods, see the guide: Start by creating a tf.data.Dataset normally. Use tf.distribute.Strategy.experimental_distribute_dataset to convert a tf.data.Dataset to something that produces "per-replica" values. If you want to https://www.tensorflow.org/api_docs/python/tf/distribute/Strategy
upvoted 4 times
...
zeic
2 years, 10 months ago
Selected Answer: D
To speed up the training of the deep learning model, increasing the batch size. When using multiple GPUs with tf.distribute.MirroredStrategy, increasing the batch size can help to better utilize the additional GPUs and potentially reduce the training time. This is because larger batch sizes allow each GPU to process more data in parallel, which can help to improve the efficiency of the training process.
upvoted 1 times
...
ares81
2 years, 10 months ago
Selected Answer: C
TPUs are Google's specialized ASICs designed to dramatically accelerate machine learning workloads. Hence it should be C.
upvoted 1 times
...
Nayak8
2 years, 10 months ago
Selected Answer: D
I think it's D
upvoted 1 times
...
MithunDesai
2 years, 10 months ago
Selected Answer: A
I think its A
upvoted 4 times
...
hiromi
2 years, 10 months ago
Selected Answer: B
B (not sure) - https://www.tensorflow.org/guide/keras/writing_a_training_loop_from_scratch -https://www.tensorflow.org/guide/distributed_training#use_tfdistributestrategy_with_custom_training_loops
upvoted 1 times
hiromi
2 years, 10 months ago
Sorry, ans D (by ediaa link)
upvoted 1 times
...
hiromi
2 years, 10 months ago
It's should A
upvoted 1 times
...
...

Topic 1 Question 104

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 104 discussion

You work for a gaming company that has millions of customers around the world. All games offer a chat feature that allows players to communicate with each other in real time. Messages can be typed in more than 20 languages and are translated in real time using the Cloud Translation API. You have been asked to build an ML system to moderate the chat in real time while assuring that the performance is uniform across the various languages and without changing the serving infrastructure.

You trained your first model using an in-house word2vec model for embedding the chat messages translated by the Cloud Translation API. However, the model has significant differences in performance across the different languages. How should you improve it?

  • A. Add a regularization term such as the Min-Diff algorithm to the loss function.
  • B. Train a classifier using the chat messages in their original language.
  • C. Replace the in-house word2vec with GPT-3 or T5.
  • D. Remove moderation for languages for which the false positive rate is too high.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
TNT87
Highly Voted 2 years, 2 months ago
Selected Answer: B
Answer B Since the performance of the model varies significantly across different languages, it suggests that the translation process might have introduced some noise in the chat messages, making it difficult for the model to generalize across languages. One way to address this issue is to train a classifier using the chat messages in their original language.
upvoted 11 times
...
tavva_prudhvi
Highly Voted 2 years, 1 month ago
Selected Answer: B
Since the current model has significant differences in performance across the different languages, it is likely that the translations produced by the Cloud Translation API are not of uniform quality across all languages. Therefore, it would be best to train a classifier using the chat messages in their original language instead of relying on translations. This approach has several advantages. First, the model can directly learn the nuances of each language, leading to better performance across all languages. Second, it eliminates the need for translation, reducing the possibility of errors and improving the overall speed of the system. Finally, it is a relatively simple approach that can be implemented without changing the serving infrastructure.
upvoted 5 times
...
b7ad1d9
Most Recent 1 month, 3 weeks ago
Selected Answer: B
C is expensive. B is the best NLP solution that solves the translation problem (where the issue seems to lie) of Cloud Translation API, by replacing it with a classifier
upvoted 2 times
...
desertlotus1211
8 months, 2 weeks ago
Selected Answer: C
The issue is with the language translation - GPT-3 or T5 are trained on large multilingual datasets and are designed to capture the nuances of multiple languages. By replacing your in-house word2vec model with one of these state-of-the-art models, you can leverage their robust, context-aware embeddings to achieve more uniform performance across various languages.
upvoted 2 times
...
Zwi3b3l
1 year, 3 months ago
Selected Answer: A
uniform performance
upvoted 1 times
pinimichele01
1 year ago
Adding a regularization term to the loss function can help prevent overfitting of the model, but it may not necessarily address the language-specific differences in performance. The Min-Diff algorithm is a type of regularization technique that aims to minimize the difference between the model predictions and the ground truth while ensuring that the model remains simple. While this can improve the generalization performance of the model, it may not be sufficient to address the language-specific differences in performance. Therefore, training a classifier using the chat messages in their original language can be a better solution to improve the performance of the moderation system across different languages.
upvoted 1 times
...
...
ciro_li
1 year, 9 months ago
Selected Answer: B
Min-diff may reduce model unfairness, but here the concern is about improving performance. Training models avoiding Cloud Natural API should be more suitable.
upvoted 2 times
tavva_prudhvi
1 year, 9 months ago
Adding a regularization term to the loss function can help prevent overfitting of the model, but it may not necessarily address the language-specific differences in performance. The Min-Diff algorithm is a type of regularization technique that aims to minimize the difference between the model predictions and the ground truth while ensuring that the model remains simple. While this can improve the generalization performance of the model, it may not be sufficient to address the language-specific differences in performance. Therefore, training a classifier using the chat messages in their original language can be a better solution to improve the performance of the moderation system across different languages.
upvoted 1 times
...
...
friedi
1 year, 10 months ago
Selected Answer: A
A is correct, the key part of the question is „[…] assuring the performance is uniform […]“ which is baked into the Min-Diff regularisation: https://ai.googleblog.com/2020/11/mitigating-unfair-bias-in-ml-models.html
upvoted 2 times
...
M25
2 years ago
Selected Answer: B
Went with B
upvoted 1 times
...
hakook
2 years, 2 months ago
Selected Answer: A
should be A https://ai.googleblog.com/2020/11/mitigating-unfair-bias-in-ml-models.html
upvoted 2 times
...
Ml06
2 years, 2 months ago
B i think is the correct answer C is an overkill , you have just developed your first model you don’t jump into solution like C , in addition the problem is that there is a significant difference between language note the model is enormously underperforming . Finally you are serving millions of users , running chat GPT or T5 for a task like chat moderation (and in real time) is extremely wasteful .
upvoted 3 times
...
John_Pongthorn
2 years, 3 months ago
Given that GPT-3 is rival of google , C is not possible certainly .
upvoted 3 times
John_Pongthorn
2 years, 3 months ago
we are taking into account 20 muti classification, it is relevant about FP or FN.
upvoted 1 times
...
Fer660
2 months, 2 weeks ago
Agree, that should be a pretty clear signal that C is not the answer :)
upvoted 1 times
...
...
egdiaa
2 years, 4 months ago
Selected Answer: C
GPT-3 is best for generating human-like Text
upvoted 3 times
lightnessofbein
2 years, 2 months ago
Does "moderate" means we need to generate text?
upvoted 2 times
desertlotus1211
8 months, 2 weeks ago
yes, what else would you generate when you need to communicate over a messaging system?
upvoted 1 times
...
...
...
kunal_18
2 years, 4 months ago
Ans : C https://towardsdatascience.com/poor-mans-gpt-3-few-shot-text-generation-with-t5-transformer-51f1b01f843e
upvoted 1 times
...

Topic 1 Question 105

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 105 discussion

You work for a gaming company that develops massively multiplayer online (MMO) games. You built a TensorFlow model that predicts whether players will make in-app purchases of more than $10 in the next two weeks. The model’s predictions will be used to adapt each user’s game experience. User data is stored in BigQuery. How should you serve your model while optimizing cost, user experience, and ease of management?

  • A. Import the model into BigQuery ML. Make predictions using batch reading data from BigQuery, and push the data to Cloud SQL
  • B. Deploy the model to Vertex AI Prediction. Make predictions using batch reading data from Cloud Bigtable, and push the data to Cloud SQL.
  • C. Embed the model in the mobile application. Make predictions after every in-app purchase event is published in Pub/Sub, and push the data to Cloud SQL.
  • D. Embed the model in the streaming Dataflow pipeline. Make predictions after every in-app purchase event is published in Pub/Sub, and push the data to Cloud SQL.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
hiromi
Highly Voted 2 years, 4 months ago
Selected Answer: A
it seens A (not sure) - https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-syntax-create-tensorflow
upvoted 12 times
...
bc3f222
Most Recent 8 months, 2 weeks ago
Selected Answer: A
The hint is "You built a TensorFlow model that predicts whether players will make in-app purchases of more than $10 in the next two weeks." this means that for this particular use case prediction is not realtime and batch is in fact suitable. Furthermore, BQML allows you to load tensorflow model for serving. This makes BQML the best choice for cost consideration.
upvoted 4 times
...
desertlotus1211
8 months, 2 weeks ago
Selected Answer: C
it's an online gaming service- you in to stream data in realtime, not batch processing
upvoted 1 times
desertlotus1211
8 months, 2 weeks ago
My mistake - I meant to click D.
upvoted 2 times
...
...
NamitSehgal
8 months, 3 weeks ago
Selected Answer: B
BigQuery ML is a useful tool for certain machine learning tasks, it's not the right tool for serving a complex TensorFlow model and integrating it into a game's user experience adaptation system. Vertex AI Prediction is a better choice for this scenario due to its superior support for serving complex models, its optimized infrastructure for serving, and its ease of management.
upvoted 1 times
...
phani49
10 months, 3 weeks ago
Selected Answer: D
Why D is the Best Choice: It provides real-time predictions, which is crucial for a good user experience in an MMO setting. It leverages Google Cloud’s managed services (Dataflow, Pub/Sub, Cloud SQL) to reduce operational overhead and simplify management. It allows you to centrally manage your model and easily update it without requiring changes to client applications. It optimizes cost by using a pay-as-you-go, autoscaling service rather than running large-scale batch jobs or deploying models on individual user devices. Option A: Import model into BigQuery ML and do batch predictions. User Experience: Batch predictions are not real-time. This approach introduces a significant delay between data ingestion and predictions. Not ideal if you need to adapt the user experience quickly based on recent behavior.
upvoted 2 times
...
pinimichele01
1 year ago
Selected Answer: A
Make predictions after every in-app purchase it it not necessary -> A
upvoted 2 times
...
Mickey321
1 year, 6 months ago
Selected Answer: D
Embedding the model in a streaming Dataflow pipeline allows low latency predictions on real-time events published to Pub/Sub. This provides a responsive user experience. Dataflow provides a managed service to scale predictions and integrate with Pub/Sub, without having to manage servers. Streaming predictions only when events occur optimizes cost compared to bulk or client-side prediction. Pushing results to Cloud SQL provides a managed database for persistence. In contrast, options A and B use inefficient batch predictions. Option C increases mobile app size and cost.
upvoted 2 times
...
SamuelTsch
1 year, 10 months ago
Selected Answer: D
D could be correct
upvoted 1 times
...
Nxtgen
1 year, 10 months ago
Selected Answer: D
These were my reasonings to choose D as best option: B -> Vertex AI would not minimize cost C -> Would not optimize user experience (this may lead to slow running of the game (lag)?) A- > Would not optimize ease of management / automatization D -> Best choice?
upvoted 1 times
tavva_prudhvi
1 year, 6 months ago
Why do you want to make a prediction after every app purchase bro?
upvoted 3 times
...
...
M25
2 years ago
Selected Answer: D
For "used to adapt each user's game experience" points out to non-batch, hence excludes A & B, and embedding the model in the mobile app would not necessarily "optimize cost". Plus, the classical streaming solution builds on Dataflow along with Pub/Sub and BigQuery, embedding ML in Dataflow is low-code https://cloud.google.com/blog/products/data-analytics/latest-dataflow-innovations-for-real-time-streaming-and-aiml and apparently a modified version of the question points to the same direction https://mikaelahonen.com/en/data/gcp-mle-exam-questions/
upvoted 3 times
ciro_li
1 year, 9 months ago
there's no need to make a prediction after every in-app purchase event. Am i wrong?
upvoted 4 times
...
...
TNT87
2 years ago
Selected Answer: A
Yeah its A
upvoted 2 times
...
TNT87
2 years, 2 months ago
Selected Answer: C
Answer C
upvoted 2 times
tavva_prudhvi
2 years, 1 month ago
Option C, embedding the model in the mobile application, can increase the size of the application and may not be suitable for real-time prediction.
upvoted 2 times
...
...

Topic 1 Question 106

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 106 discussion

You are building a linear regression model on BigQuery ML to predict a customer’s likelihood of purchasing your company’s products. Your model uses a city name variable as a key predictive component. In order to train and serve the model, your data must be organized in columns. You want to prepare your data using the least amount of coding while maintaining the predictable variables. What should you do?

  • A. Use TensorFlow to create a categorical variable with a vocabulary list. Create the vocabulary file, and upload it as part of your model to BigQuery ML.
  • B. Create a new view with BigQuery that does not include a column with city information
  • C. Use Cloud Data Fusion to assign each city to a region labeled as 1, 2, 3, 4, or 5, and then use that number to represent the city in the model.
  • D. Use Dataprep to transform the state column using a one-hot encoding method, and make each city a column with binary values.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
fitri001
1 year ago
Selected Answer: D
A. Using TensorFlow: This is an overkill for this scenario. BigQuery ML can handle one-hot encoding natively within Dataprep. B. Excluding City Information: This removes a potentially important predictive variable, reducing model accuracy. C. Assigning Region Labels: This approach loses granularity and might not capture the specific variations between cities.
upvoted 3 times
...
andresvelasco
1 year, 8 months ago
Selected Answer: D
D by elimination but ... Does not bigquery automatically do one-hot encoding of categorical features for you? Also the wording of the question does not seem right: a linear regression model to predict the likelihodd that the customer ... isnt that a classification model?
upvoted 1 times
...
M25
2 years ago
Selected Answer: D
Went with D
upvoted 1 times
...
Yajnas_arpohc
2 years, 1 month ago
Is it correct to say that A is technically a better way to do things if the ask wast for separate columns?
upvoted 1 times
tavva_prudhvi
2 years, 1 month ago
"least amount of coding"
upvoted 5 times
...
...
guilhermebutzke
2 years, 1 month ago
Selected Answer: D
One-hot is a good way to use categorical variables in regressions problems https://academic.oup.com/rheumatology/article/54/7/1141/1849688 https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-auto-preprocessing
upvoted 3 times
...
TNT87
2 years, 2 months ago
Selected Answer: D
Answer D
upvoted 1 times
...
abneural
2 years, 2 months ago
Selected Answer: C
for a fuller answer, D--> transforms “state” column not city column C--> at least works with city column
upvoted 1 times
tavva_prudhvi
2 years, 1 month ago
Read smarques comment
upvoted 1 times
...
...
John_Pongthorn
2 years, 3 months ago
Selected Answer: D
https://docs.trifacta.com/display/SS/Prepare+Data+for+Machine+Processing
upvoted 2 times
...
smarques
2 years, 3 months ago
Selected Answer: D
This will allow you to maintain the city name variable as a predictor while ensuring that the data is in a format that can be used to train a linear regression model on BigQuery ML.
upvoted 1 times
...
Abhijat
2 years, 4 months ago
Selected Answer: D
Answer D
upvoted 1 times
...
Abhijat
2 years, 4 months ago
Answer is D
upvoted 1 times
...
mymy9418
2 years, 4 months ago
Selected Answer: D
one-hot encoding makes sense to me
upvoted 2 times
...
hiromi
2 years, 4 months ago
Selected Answer: C
I vote for C
upvoted 2 times
hiromi
2 years, 4 months ago
Changing my vote to D
upvoted 3 times
...
...

Topic 1 Question 107

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 107 discussion

You are an ML engineer at a bank that has a mobile application. Management has asked you to build an ML-based biometric authentication for the app that verifies a customer’s identity based on their fingerprint. Fingerprints are considered highly sensitive personal information and cannot be downloaded and stored into the bank databases. Which learning strategy should you recommend to train and deploy this ML mode?

  • A. Data Loss Prevention API
  • B. Federated learning
  • C. MD5 to encrypt data
  • D. Differential privacy
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
hiromi
Highly Voted 2 years, 4 months ago
Selected Answer: B
B With federated learning, all the data is collected, and the model is trained with algorithms across multiple decentralized edge devices such as cell phones or websites, without exchanging them. (Journey to Become a Google Cloud Machine Learning Engineer: Build the mind and hand of a Google Certified ML professional)
upvoted 10 times
...
fitri001
Most Recent 1 year ago
Selected Answer: B
Federated learning allows training the model on the user's devices themselves. pen_spark expand_more The model updates its parameters based on local training data on the device without ever needing the raw fingerprint information to leave the device. This ensures the highest level of privacy for sensitive biometric data.
upvoted 2 times
fitri001
1 year ago
Data Loss Prevention API (DLAPI): This focuses on protecting data at rest and in transit, not relevant to training a model without storing data. MD5 Encryption: This is a one-way hashing function, not suitable for encryption and decryption needed for training.expand_more Differential privacy: While it adds noise to protect privacy, it's not ideal for training image recognition models like fingerprint identification.
upvoted 1 times
...
...
Voyager2
1 year, 11 months ago
B. Federated learning. "information and cannot be downloaded and stored into the bank databases" That excludes DLP. ederated Learning enables mobile phones to collaboratively learn a shared prediction model while keeping all the training data on device, decoupling the ability to do machine learning from the need to store the data in the cloud.
upvoted 3 times
...
M25
2 years ago
Selected Answer: B
Went with B
upvoted 1 times
...
Yajnas_arpohc
2 years, 1 month ago
Selected Answer: B
I think the giveaway is in the question "Which learning strategy.."... Federated Learning seems to be the only one !
upvoted 3 times
...
TNT87
2 years, 2 months ago
Selected Answer: B
B. Federated learning would be the best learning strategy to train and deploy the ML model for biometric authentication in this scenario. Federated learning allows for training an ML model on distributed data without transferring the raw data to a centralized location.
upvoted 2 times
...
zzzzzooooo
2 years, 2 months ago
Selected Answer: A
Ans is A for me
upvoted 1 times
...
ares81
2 years, 4 months ago
Selected Answer: A
It seems A, to me.
upvoted 1 times
...
mil_spyro
2 years, 5 months ago
Selected Answer: B
Federated Learning enables mobile phones to collaboratively learn a shared prediction model while keeping all the training data on device. https://ai.googleblog.com/2017/04/federated-learning-collaborative.html
upvoted 1 times
...

Topic 1 Question 108

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 108 discussion

You are experimenting with a built-in distributed XGBoost model in Vertex AI Workbench user-managed notebooks. You use BigQuery to split your data into training and validation sets using the following queries:

CREATE OR REPLACE TABLE ‘myproject.mydataset.training‘ AS
(SELECT * FROM ‘myproject.mydataset.mytable‘ WHERE RAND() <= 0.8);

CREATE OR REPLACE TABLE ‘myproject.mydataset.validation‘ AS
(SELECT * FROM ‘myproject.mydataset.mytable‘ WHERE RAND() <= 0.2);

After training the model, you achieve an area under the receiver operating characteristic curve (AUC ROC) value of 0.8, but after deploying the model to production, you notice that your model performance has dropped to an AUC ROC value of 0.65. What problem is most likely occurring?

  • A. There is training-serving skew in your production environment.
  • B. There is not a sufficient amount of training data.
  • C. The tables that you created to hold your training and validation records share some records, and you may not be using all the data in your initial table.
  • D. The RAND() function generated a number that is less than 0.2 in both instances, so every record in the validation table will also be in the training table.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
M25
Highly Voted 2 years, 6 months ago
Selected Answer: C
- Excluding D as RAND() samples 80% for “.training” & 20% for “.validaton”: https://stackoverflow.com/questions/42115968/how-does-rand-works-in-bigquery; - Could be that those 2 samplings share some records since pseudo-randomly sampled over the same “.mytable”, & therefore might not be using all of its data, thus C seems valid; - Excluding B as there is no indication otherwise of insufficient amount of training data, after training AUC ROC was 0.8, that we know; - There could be a training-serving skew occurring in Prod, but “most likely occurring” is C as a result of the selective information presented: https://developers.google.com/machine-learning/guides/rules-of-ml#training-serving_skew
upvoted 5 times
...
8619d79
Most Recent 9 months, 1 week ago
Selected Answer: C
Even if I don't get the sentence "you may not be using all the data in your initial table" as a percentage should also be used for testing, not?
upvoted 1 times
...
eico
1 year, 2 months ago
Selected Answer: C
Answer C
upvoted 1 times
...
formazioneQI
2 years, 6 months ago
Selected Answer: C
Answer C
upvoted 2 times
...
Yajnas_arpohc
2 years, 7 months ago
Selected Answer: C
C seems closest here
upvoted 1 times
...
TNT87
2 years, 8 months ago
Selected Answer: C
Answer C
upvoted 1 times
...
ailiba
2 years, 8 months ago
Selected Answer: C
since we are calling rand twice it might be that data that was in training set ends up in testing set too. If we had called it just once I would say D.
upvoted 2 times
...
Ahmades
2 years, 10 months ago
Selected Answer: D
Hesitated between C and D, but D looks more precise
upvoted 1 times
pshemol
2 years, 9 months ago
If there were one RAND() in front of those two queries it would be true. There are two separate RAND() and "every record in the validation table will also be in the training table" is not true.
upvoted 2 times
...
...
hiromi
2 years, 10 months ago
Selected Answer: C
C (not sure)
upvoted 4 times
...
mymy9418
2 years, 10 months ago
Selected Answer: C
the rand is generated twice
upvoted 2 times
...

Topic 1 Question 109

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 109 discussion

During batch training of a neural network, you notice that there is an oscillation in the loss. How should you adjust your model to ensure that it converges?

  • A. Decrease the size of the training batch.
  • B. Decrease the learning rate hyperparameter.
  • C. Increase the learning rate hyperparameter.
  • D. Increase the size of the training batch.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
hiromi
Highly Voted 2 years, 4 months ago
Selected Answer: B
B larger learning rates can reduce training time but may lead to model oscillation and may miss the optimal model parameter values.
upvoted 10 times
...
desertlotus1211
Most Recent 8 months, 2 weeks ago
Selected Answer: B
When you observe oscillations in the loss during training, it is often a sign that the learning rate is too high. A high learning rate can cause the optimizer to overshoot the minimum of the loss function
upvoted 1 times
...
fitri001
1 year ago
Selected Answer: B
A. Decrease Batch Size: While a smaller batch size can sometimes help with convergence, it can also lead to slower training. It might not necessarily address the issue of oscillation. C. Increase Learning Rate: A higher learning rate can cause the loss to jump around more erratically, potentially worsening the oscillation problem. D. Increase Batch Size: A larger batch size can lead to smoother updates but might also make the model less sensitive to local gradients and hinder convergence, especially with an already oscillating loss.
upvoted 2 times
...
Akel123
1 year ago
Selected Answer: C
I don't understand
upvoted 2 times
...
M25
2 years ago
Selected Answer: B
Went with B
upvoted 1 times
...
TNT87
2 years, 2 months ago
Selected Answer: B
Answer B
upvoted 1 times
...
enghabeth
2 years, 3 months ago
Selected Answer: B
having a large learning rate results in Instability or Oscillations. Thus, the first solution is to tune the learning rate by gradually decreasing it. https://towardsdatascience.com/8-common-pitfalls-in-neural-network-training-workarounds-for-them-7d3de51763ad
upvoted 1 times
...
mymy9418
2 years, 4 months ago
Selected Answer: B
https://ai.stackexchange.com/questions/14079/what-could-an-oscillating-training-loss-curve-represent#:~:text=Try%20lowering%20the%20learning%20rate,step%20and%20overshoot%20it%20again.
upvoted 2 times
...

Topic 1 Question 110

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 110 discussion

You work for a toy manufacturer that has been experiencing a large increase in demand. You need to build an ML model to reduce the amount of time spent by quality control inspectors checking for product defects. Faster defect detection is a priority. The factory does not have reliable Wi-Fi. Your company wants to implement the new ML model as soon as possible. Which model should you use?

  • A. AutoML Vision Edge mobile-high-accuracy-1 model
  • B. AutoML Vision Edge mobile-low-latency-1 model
  • C. AutoML Vision model
  • D. AutoML Vision Edge mobile-versatile-1 model
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
mil_spyro
Highly Voted 2 years, 11 months ago
Hence faster defect detection is a priority, AutoML Vision Edge mobile-low-latency-1 model should be the choice. This model is designed to run efficiently on mobile devices and prioritize low latency, which means that it can provide fast defect detection without requiring a connection to the cloud. https://cloud.google.com/vision/automl/docs/train-edge
upvoted 11 times
maukaba
2 years ago
https://cloud.google.com/vertex-ai/docs/training/automl-edge-api
upvoted 1 times
...
...
hiromi
Highly Voted 2 years, 10 months ago
Selected Answer: B
B "reduce the amount of time spent by quality control inspectors checking for product defects."-> low latency
upvoted 6 times
...
YushiSato
Most Recent 1 year, 3 months ago
low latency (MOBILE_TF_LOW_LATENCY_1) general purpose usage (MOBILE_TF_VERSATILE_1) higher prediction quality (MOBILE_TF_HIGH_ACCURACY_1)
upvoted 2 times
...
fitri001
1 year, 6 months ago
Selected Answer: B
The AutoML Vision Edge mobile-low-latency-1 model prioritizes speed over accuracy, making it ideal for real-time defect detection on the factory floor without a stable internet connection. This allows for faster inspections and quicker identification of faulty products.
upvoted 3 times
fitri001
1 year, 6 months ago
Faster Defect Detection: This is the main priority, and the low-latency model is specifically designed for speed. Edge Device Compatibility: The model should run on a device without relying on Wi-Fi. AutoML Vision Edge models are optimized for edge deployments.
upvoted 2 times
fitri001
1 year, 6 months ago
A. AutoML Vision mobile-high-accuracy-1 model: While high accuracy is desirable, faster defect detection is the top priority in this case. This model might be slower due to its focus on accuracy. C. AutoML Vision model: This model is likely designed for cloud deployment and might not be suitable for running on an edge device without reliable Wi-Fi. D. AutoML Vision Edge mobile-versatile-1 model: This model prioritizes a balance between accuracy and latency. While faster than the high-accuracy model, it might be slower than the low-latency model for this specific use case.
upvoted 3 times
...
...
...
MultiCloudIronMan
1 year, 7 months ago
Selected Answer: B
Edge device with low latency
upvoted 1 times
...
M25
2 years, 6 months ago
Selected Answer: B
Went with B
upvoted 1 times
...
TNT87
2 years, 8 months ago
Selected Answer: B
Answer B
upvoted 1 times
...
ares81
2 years, 10 months ago
Selected Answer: B
It's B.
upvoted 1 times
...
mil_spyro
2 years, 11 months ago
Selected Answer: B
vote B
upvoted 4 times
...

Topic 1 Question 111

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 111 discussion

You need to build classification workflows over several structured datasets currently stored in BigQuery. Because you will be performing the classification several times, you want to complete the following steps without writing code: exploratory data analysis, feature selection, model building, training, and hyperparameter tuning and serving. What should you do?

  • A. Train a TensorFlow model on Vertex AI.
  • B. Train a classification Vertex AutoML model.
  • C. Run a logistic regression job on BigQuery ML.
  • D. Use scikit-learn in Notebooks with pandas library.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
hiromi
Highly Voted 2 years, 4 months ago
Selected Answer: B
B (similar to question 7)
upvoted 8 times
...
fitri001
Most Recent 1 year ago
Selected Answer: B
Vertex AutoML is a Google Cloud Platform service designed for building machine learning models without writing code.expand_more It automates various stages of the machine learning pipeline, including those you mentioned: Exploratory data analysis Feature selection Model building (supports various classification algorithms) Training Hyperparameter tuning
upvoted 2 times
...
M25
2 years ago
Selected Answer: B
Went with B
upvoted 1 times
...
TNT87
2 years, 2 months ago
Selected Answer: B
Answer B
upvoted 1 times
...
ares81
2 years, 4 months ago
Selected Answer: B
A and D need coding. C is regression, not classification. Hence B.
upvoted 1 times
ares81
2 years, 4 months ago
My mistake, it's logistic regression, meaning classification. But it still requires some coding. So still B.
upvoted 4 times
...
...
mymy9418
2 years, 4 months ago
Selected Answer: B
BQML will need coding only AutoML in Vertex AI is codeless from end to end
upvoted 3 times
...

Topic 1 Question 112

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 112 discussion

You are an ML engineer in the contact center of a large enterprise. You need to build a sentiment analysis tool that predicts customer sentiment from recorded phone conversations. You need to identify the best approach to building a model while ensuring that the gender, age, and cultural differences of the customers who called the contact center do not impact any stage of the model development pipeline and results. What should you do?

  • A. Convert the speech to text and extract sentiments based on the sentences.
  • B. Convert the speech to text and build a model based on the words.
  • C. Extract sentiment directly from the voice recordings.
  • D. Convert the speech to text and extract sentiment using syntactical analysis.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
mymy9418
Highly Voted 2 years, 4 months ago
Selected Answer: A
Syntactic Analysis is not for sentiment analysis
upvoted 12 times
...
fitri001
Highly Voted 1 year ago
Selected Answer: A
A. Convert speech to text and extract sentiments based on sentences: This method focuses on the content of the conversation, minimizing the influence of factors like voice tone (which can be culturally or gender-specific). Sentiment analysis techniques can analyze the meaning and context of sentences to identify positive, negative, or neutral sentiment.
upvoted 8 times
fitri001
1 year ago
B. Convert speech to text and build a model based on the words: While words are important, relying solely on them can miss the context and lead to bias. For example, "great" might be positive in most cases, but in some cultures, it might be used sarcastically. C. Extract sentiment directly from voice recordings: This approach can be biased as voice characteristics like pitch or pace can vary based on gender, age, and cultural background. D. Convert speech to text and extract sentiment using syntactical analysis: While syntax can provide some clues, it's not the strongest indicator of sentiment. Additionally, cultural differences in sentence structure could impact accuracy.
upvoted 3 times
...
...
desertlotus1211
Most Recent 8 months, 2 weeks ago
Selected Answer: D
goes a step further by analyzing the structure and grammatical relationships within the sentences. This approach abstracts away from just the words used and focuses on how the words are put together, capturing deeper semantic and contextual information, less likely to be influenced by demographic variations in language use
upvoted 1 times
...
RioGrande
1 year, 5 months ago
The correct answer should be A. Word embeddings have static embeddings for the same words, while contextual embeddings vary depending on the context. "May’s sentence embedding adaptation of WEAT, known as the Sentence Embedding Association Test (SEAT), shows less clear racial and gender bias in language models and embeddings than the corresponding word embedding formulation" From: https://medium.com/institute-for-applied-computational-science/bias-in-nlp-embeddings-b1dabb8bbe20
upvoted 2 times
...
pico
1 year, 5 months ago
Selected Answer: B
This approach involves converting the speech to text, which allows you to analyze the content of the conversations without directly dealing with the speakers' gender, age, or cultural differences. By building a model based on the words, you can focus on the language used in the conversations to predict sentiment, making the model more inclusive and less sensitive to demographic factors. Option A could be influenced by the syntactical nuances and structures used in different cultures, and option C might be impacted by the variations in voice tones across genders and ages. Option B, on the other hand, relies on the text content, which provides a more neutral and content-focused basis for sentiment analysis.
upvoted 2 times
...
MCorsetti
1 year, 6 months ago
Selected Answer: B
B: People of different cultures will often use difference sentence structures, so words would be safer than sentences
upvoted 1 times
tavva_prudhvi
1 year, 6 months ago
Yeah, but they(words) may miss the context of the sentiment, leading to inaccuracies!
upvoted 1 times
...
...
tavva_prudhvi
1 year, 9 months ago
Selected Answer: A
building a model based on words, may also be effective but could potentially be influenced by factors such as accents, dialects, or language variations that may differ between speakers.extracting sentiment directly from voice recordings, may be less accurate due to the subjective nature of interpreting emotions from audio alone.using syntactical analysis, may be useful in certain contexts but may not capture the full range of sentiment expressed in a conversation. Therefore, A provides the most comprehensive and unbiased approach to sentiment analysis in this scenario.
upvoted 1 times
pico
1 year, 5 months ago
Option A could be influenced by the syntactical nuances and structures used in different cultures
upvoted 1 times
tavva_prudhvi
1 year, 5 months ago
See, both have their own advantages & dissadvantages, but we should choose the option which is more relevant
upvoted 1 times
...
...
...
ciro_li
1 year, 9 months ago
Selected Answer: A
Answer A
upvoted 1 times
ciro_li
1 year, 9 months ago
Answer B*
upvoted 1 times
...
...
erenklclar
1 year, 9 months ago
Selected Answer: C
By working directly with the audio data, you can account for important aspects like tone, pitch, and rhythm of speech, which might provide valuable information regarding sentiment.
upvoted 3 times
...
NickHapton
1 year, 10 months ago
vote for A between words and sentences: Age and gender considerations: Sentences provide a broader view of sentiment that can help mitigate age and gender biases. Analyzing at the sentence level allows you to observe sentiment patterns across various demographic groups, which can help identify any biases that may arise. By considering the overall sentiment expressed in sentences, you can minimize the impact of individual words that might carry specific biases.
upvoted 1 times
...
M25
2 years ago
Selected Answer: C
There is the possibility for a more sophisticated architecture for an audio processing pipeline, and the “not impact any stage of the model development pipeline and results” somewhat calls for a more holistic answer: https://cloud.google.com/architecture/categorizing-audio-files-using-ml#converting_speech_to_text. Plus, it adds “voice emotion information, related to an audio recording, indicating that a vocal utterance of a speaker is spoken with negative or positive emotion”: https://patents.google.com/patent/US20140220526A1/en.
upvoted 2 times
M25
2 years ago
The emphasis here is on #ResponsibleAI https://cloud.google.com/natural-language/automl/docs/beginners-guide
upvoted 1 times
...
M25
2 years ago
A reason why one could exclude “Convert the speech to text” altogether [Options A, B & D] could be, for instance, because “speech transcription may have higher error rates for African Americans than White Americans [3]”: https://developers.googleblog.com/2018/04/text-embedding-models-contain-bias.html.
upvoted 1 times
M25
2 years ago
“Cloud NL API can perform syntactic analysis directly on a file located in Cloud Storage.” “Syntactic Analysis [Option D] breaks up the given text into a series of sentences [Option A] and tokens (generally, words [Option B]) and provides linguistic information about those tokens”: https://cloud.google.com/natural-language/docs/analyzing-syntax. It “can be used to identify the parts of speech, determine the structure of a sentence, and determine the meaning of words in context”: https://ts2.space/en/a-comprehensive-guide-to-google-cloud-natural-language-apis-syntax-analysis/.
upvoted 1 times
...
...
...
formazioneQI
2 years ago
Selected Answer: B
I agree with qaz09. To avoid demographical variables influence model shoud be built on the words.
upvoted 2 times
...
TNT87
2 years, 2 months ago
Selected Answer: A
Answer A
upvoted 1 times
...
qaz09
2 years, 3 months ago
Selected Answer: B
For "ensuring that the gender, age, and cultural differences of the customers who called the contact center do not impact any stage of the model development pipeline and results" I think the model should be built on the words rather than sentences
upvoted 3 times
...
ares81
2 years, 4 months ago
Selected Answer: A
A makes sense, to me.
upvoted 1 times
...
hiromi
2 years, 4 months ago
Selected Answer: A
A Convert the speech to text and extract sentiments based on the sentences.
upvoted 1 times
...
mil_spyro
2 years, 5 months ago
Selected Answer: D
vote D
upvoted 1 times
Yajnas_arpohc
2 years, 1 month ago
Based only on words might be misleading; at a minimum need to go w sentences
upvoted 2 times
...
...

Topic 1 Question 113

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 113 discussion

You need to analyze user activity data from your company’s mobile applications. Your team will use BigQuery for data analysis, transformation, and experimentation with ML algorithms. You need to ensure real-time ingestion of the user activity data into BigQuery. What should you do?

  • A. Configure Pub/Sub to stream the data into BigQuery.
  • B. Run an Apache Spark streaming job on Dataproc to ingest the data into BigQuery.
  • C. Run a Dataflow streaming job to ingest the data into BigQuery.
  • D. Configure Pub/Sub and a Dataflow streaming job to ingest the data into BigQuery,
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
pshemol
Highly Voted 2 years, 10 months ago
Selected Answer: A
Previously Google pattern was Pub/Sub -> Dataflow -> BQ but now it looks as there is new Pub/Sub -> BQ https://cloud.google.com/blog/products/data-analytics/pub-sub-launches-direct-path-to-bigquery-for-streaming-analytics
upvoted 20 times
TNT87
2 years, 8 months ago
New pub sub??? heheheh
upvoted 1 times
...
TNT87
2 years, 8 months ago
https://cloud.google.com/blog/products/data-analytics/pub-sub-launches-direct-path-to-bigquery-for-streaming-analytics You should have said pub sub has been upgrade to directly stream to bigquery templates...not new pub sub
upvoted 2 times
...
...
HaroonRaizada01
Most Recent 8 months ago
Selected Answer: D
This approach provides a scalable, flexible, and efficient solution for real-time data ingestion and transformation, ensuring that user activity data is seamlessly integrated into BigQuery for analysis and explanations. BigQuery can perform powerful data transformations using SQL. However, there are key differences in how BigQuery and Dataflow handle data, especially in the context of real-time data ingestion.
upvoted 2 times
...
desertlotus1211
8 months, 2 weeks ago
Selected Answer: D
Pub/Sub and Dataflow are needed for real-time ingestion. Pub/Sub cannot do it alone
upvoted 1 times
desertlotus1211
8 months, 2 weeks ago
Now I'm torn between A or D... since you can use BQ subscriptions...
upvoted 1 times
desertlotus1211
8 months, 2 weeks ago
Ahh it's D, the our team will use BigQuery for data analysis, transformation, and experimentation... this is key!
upvoted 1 times
...
...
...
rajshiv
11 months, 1 week ago
Selected Answer: D
Pub/Sub is used for message ingestion and it cannot directly load data into BigQuery. Pub/Sub only delivers messages, but we will need Dataflow or another processing tool to transform and load the data into BigQuery. Hence it is D in my opinion.
upvoted 2 times
...
Pau1234
11 months, 1 week ago
Selected Answer: D
Since Data transformation is needed.
upvoted 2 times
...
baimus
1 year, 2 months ago
Selected Answer: A
The question specifies that transformation occurs in Bigquery. This means the new direct pub/sub to bigquery streaming path is correct.
upvoted 2 times
...
Prakzz
1 year, 4 months ago
Selected Answer: D
Need PubSub and Dataflow both for this
upvoted 1 times
...
ludovikush
1 year, 8 months ago
Selected Answer: D
Werner123 i agree
upvoted 1 times
...
Werner123
1 year, 8 months ago
Selected Answer: D
User data would most likely include PII, for that case it is still recommended to use Dataflow since you need to remove/anonymise sensitive data.
upvoted 2 times
...
pico
1 year, 12 months ago
I would have added "with / without data transformation" to the question to choose the right answer between A or D
upvoted 1 times
...
andresvelasco
2 years, 2 months ago
Selected Answer: A
I had my doubts between A and D. But since the transformation will occur in bigquery I think Pubsub suffices.
upvoted 3 times
...
M25
2 years, 6 months ago
Selected Answer: D
Agree with TNT87. From the same link: “For Pub/Sub messages where advanced preload transformations or data processing before landing data in BigQuery (such as masking PII) is necessary, we still recommend going through Dataflow.” It’s “analyze user activity data”, not merely streaming IoT into BigQuery so that concerns like privacy are per se n/a. One can deal with PII after landing in BigQuery as well, but apparently that’s not what they recommend.
upvoted 3 times
...
PHD_CHENG
2 years, 7 months ago
Selected Answer: D
Pub/Sub -> DataFlow -> BigQuery
upvoted 2 times
...
TNT87
2 years, 8 months ago
Selected Answer: D
D. Configure Pub/Sub and a Dataflow streaming job to ingest the data into BigQuery. This solution involves using Google Cloud Pub/Sub as the messaging service to receive the data from the mobile application, and then using Google Cloud Dataflow to transform and load the data into BigQuery in real time. Pub/Sub is a scalable and reliable messaging service that can handle high-volume real-time data streaming, while Dataflow provides a unified programming model to develop and run data processing pipelines. This solution is suitable for handling large volumes of user activity data from mobile applications and ingesting it into BigQuery in real-time for analysis and ML experimentation.
upvoted 2 times
TNT87
2 years, 8 months ago
Starting today, you no longer have to write or run your own pipelines for data ingestion from Pub/Sub into BigQuery. We are introducing a new type of Pub/Sub subscription called a “BigQuery subscription” that writes directly from Cloud Pub/Sub to BigQuery. This new extract, load, and transform (ELT) path will be able to simplify your event-driven architecture. For Pub/Sub messages where advanced preload transformations or data processing before landing data in BigQuery (such as masking PII) is necessary, we still recommend going through Dataflow
upvoted 1 times
...
...
hiromi
2 years, 10 months ago
Selected Answer: A
A agree with pshemol
upvoted 3 times
...
mymy9418
2 years, 10 months ago
Selected Answer: D
need dataflow
upvoted 2 times
mil_spyro
2 years, 10 months ago
transformation will be handled in BQ hence I think A
upvoted 6 times
mymy9418
2 years, 10 months ago
agree.
upvoted 1 times
...
...
...

Topic 1 Question 114

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 114 discussion

You work for a gaming company that manages a popular online multiplayer game where teams with 6 players play against each other in 5-minute battles. There are many new players every day. You need to build a model that automatically assigns available players to teams in real time. User research indicates that the game is more enjoyable when battles have players with similar skill levels. Which business metrics should you track to measure your model’s performance?

  • A. Average time players wait before being assigned to a team
  • B. Precision and recall of assigning players to teams based on their predicted versus actual ability
  • C. User engagement as measured by the number of battles played daily per user
  • D. Rate of return as measured by additional revenue generated minus the cost of developing a new model
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
pshemol
Highly Voted 2 years, 10 months ago
Selected Answer: C
The game is more enjoyable - the better and "business metrics" points me to user engagement as best metric
upvoted 12 times
...
b7ad1d9
Most Recent 1 month, 3 weeks ago
Selected Answer: C
For the exam, I would select C because "user engagement" is the metric that defines how enjoyable the game is. However, in real life, if I was the model owner I would ask to be judged on option B (precision and recall of predicted vs. actual ability) because that is what my model is doing! User engagement might be off due to other reasons beyond matched ability. I don't know if the PMLE exam is that nuanced!
upvoted 1 times
...
rajshiv
11 months, 1 week ago
Selected Answer: B
I do not agree with C as User engagement would not help directly to evaluate whether players are enjoying more balanced matches due to the model’s performance. As the goal of the model is to assign players to teams based on their skill level so that teams are balanced and enjoyable for all participants I will go with B as the more appropriate answer.
upvoted 3 times
...
baimus
1 year, 2 months ago
Selected Answer: C
This question doesn't specify how "additional revenue" is measured. Most businesses I've worked for would love "D" for all our models instead of anything else. That being said, C is the only measurable business metric there.
upvoted 1 times
...
fitri001
1 year, 6 months ago
Selected Answer: C
focusing on user engagement through the number of battles played daily provides a clearer indication of whether the model successfully creates balanced and enjoyable matches, which is the core objective. If players find battles more engaging due to fairer competition, they're more likely to keep playing. This can then translate to long-term benefits like increased retention and potential monetization opportunities.
upvoted 2 times
fitri001
1 year, 6 months ago
A. Average time players wait before being assigned to a team: While faster matchmaking is desirable, it shouldn't come at the expense of balanced teams. If wait times are very low but battles are imbalanced due to poor matchmaking, user engagement might suffer. B. Precision and recall of assigning players to skill level: These metrics are valuable for evaluating the model's ability to predict skill accurately. However, they don't directly measure the impact on user experience and enjoyment. D. Rate of return: This metric focuses on financial gain, which might not be the primary objective in this case. Prioritizing balanced teams for a more enjoyable experience can indirectly lead to higher user retention and potentially more revenue in the long run.
upvoted 4 times
...
...
edoo
1 year, 8 months ago
Selected Answer: C
Tempted by B but "user engagement" is the keyword.
upvoted 2 times
edoo
1 year, 8 months ago
I meant "business metric".
upvoted 2 times
...
...
guilhermebutzke
1 year, 9 months ago
Selected Answer: C
Looking for "business metrics to track," I think C could be the most important metric. Although, option B is also a good choice.
upvoted 2 times
...
MCorsetti
2 years ago
Selected Answer: C
C: Business metric i.e. outcome driven
upvoted 1 times
...
tavva_prudhvi
2 years, 3 months ago
"Business metrics" does suggest that the question is looking for metrics that are relevant to the business goals of the company, rather than purely technical metrics. In that case, C.could be a good choice. User engagement is an important metric for any online service, as it reflects how much users are enjoying and using the product. In the context of a multiplayer game, the number of battles played daily per user can indicate how well the model is doing in creating balanced teams that are enjoyable to play against. If the model is successful in creating balanced teams, then users are likely to play more games, which would increase user engagement. Therefore, C could be a suitable choice to track the performance of the model.
upvoted 3 times
...
Nxtgen
2 years, 4 months ago
Selected Answer: C
The focus is to obtain a model that assigns players to teams with players with similar level of skill (or average team 1 skill == average team 2 skill) A: A fast queue assignment may not focus on pearing players with the same levels of skills. A random assignment would work. B: This would be an option but is more difficult to measure than C, we don’t know If we have a measure of skill level. Also, for new players this metric would not be available at the beginning. I think “There are many new players every day.” is a key point important to discard answer B. C: Players play more games daily ← players enjoy the game more frequently and the other way round should also apply. Easy to measure also for new players. D:This focus on costs and revenue not on players matchmaking. I would go with C.
upvoted 2 times
...
Antmal
2 years, 6 months ago
Selected Answer: C
C because "user engagement" is a business metric https://support.google.com/analytics/answer/11109416?hl=en
upvoted 3 times
...
M25
2 years, 6 months ago
Selected Answer: C
Went with C
upvoted 1 times
...
TNT87
2 years, 6 months ago
Selected Answer: C
Answer C
upvoted 1 times
...
PHD_CHENG
2 years, 7 months ago
Selected Answer: C
The question is asking about "available players". Therefore, the business metric is the user engagement.
upvoted 4 times
...
JamesDoe
2 years, 7 months ago
Selected Answer: C
Asks for >business metric<, and problem states "user research indicates that the game is more enjoyable when battles have players with similar skill levels.", which means more battles per user if your model is performing well.
upvoted 1 times
...
dfdrin
2 years, 8 months ago
Selected Answer: C
It's C. The question specifically asks for a business metric. Precision and recall are not business metrics, but user engagement is
upvoted 4 times
...
guilhermebutzke
2 years, 8 months ago
Selected Answer: B
The template uses the 'ability' to create teams. For this, we can conclude that the system measures the player's skill. So, nothing better than comparing the predict ability with the actual ability to understand the performance of the model.
upvoted 3 times
...

Topic 1 Question 115

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 115 discussion

You are building an ML model to predict trends in the stock market based on a wide range of factors. While exploring the data, you notice that some features have a large range. You want to ensure that the features with the largest magnitude don’t overfit the model. What should you do?

  • A. Standardize the data by transforming it with a logarithmic function.
  • B. Apply a principal component analysis (PCA) to minimize the effect of any particular feature.
  • C. Use a binning strategy to replace the magnitude of each feature with the appropriate bin number.
  • D. Normalize the data by scaling it to have values between 0 and 1.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
fitri001
Highly Voted 1 year ago
Selected Answer: D
D. Normalize the data by scaling it to have values between 0 and 1 (Min-Max scaling): This technique ensures all features contribute proportionally to the model's learning process. pen_spark expand_more It prevents features with a larger magnitude from dominating the model and reduces the risk of overfitting.expand_more
upvoted 7 times
fitri001
1 year ago
A. Standardize the data by transforming it with a logarithmic function: While logarithmic transformation can help compress the range of skewed features, it might not be suitable for all features, and it can introduce non-linear relationships that might not be ideal for all machine learning algorithms. B. Apply a principal component analysis (PCA) to minimize the effect of any particular feature: PCA is a dimensionality reduction technique that can be useful, but its primary function is to reduce the number of features, not specifically address differences in feature scales. C. Use a binning strategy to replace the magnitude of each feature with the appropriate bin number: Binning can introduce information loss and might not capture the nuances within each bin, potentially affecting the model's accuracy.
upvoted 3 times
...
...
gscharly
Most Recent 1 year ago
Selected Answer: D
agree with pico
upvoted 2 times
...
pico
1 year, 5 months ago
Selected Answer: D
Not A because a logarithmic transformation may be appropriate for data with a skewed distribution, but it doesn't necessarily address the issue of features having different scales.
upvoted 4 times
Fer660
2 months, 2 weeks ago
Agree with pico. Also, nobody told us that the features are non-negative, and log could stumble there.
upvoted 1 times
...
...
Krish6488
1 year, 6 months ago
Selected Answer: D
Features with a larger magnitude might still dominate after a log transformation if the range of values is significantly different from other features. Scaling is better, will go with Option D
upvoted 1 times
...
envest
1 year, 9 months ago
by abylead: Min-Max scaling is a popular technique for normalizing stock price data. Logs are commonly used in finance to normalize relative data, such as returns.https://itadviser.dev/stock-market-data-normalization-for-time-series/
upvoted 1 times
...
djo06
1 year, 10 months ago
Selected Answer: D
D is the right answer
upvoted 1 times
...
NickHapton
1 year, 10 months ago
go for D, z-score. This question doesn't mention outlier, just large range. reason why not log transformation: log transformation is more suitable for addressing skewed distributions and reducing the impact of outliers. It compresses the range of values, especially for features with a large dynamic range. While it can help normalize the distribution, it doesn't directly address the issue of feature magnitude overpowering the model.
upvoted 1 times
...
SamuelTsch
1 year, 10 months ago
Selected Answer: A
From my point of view, log transformation is more tolerant to outliers. Thus, went to A.
upvoted 1 times
tavva_prudhvi
1 year, 10 months ago
n cases where the data has significant skewness or a large number of outliers, option A (log transformation) might be more suitable. However, if the primary concern is to equalize the influence of features with different magnitudes and the data is not heavily skewed or has few outliers, option D (normalizing the data) would be more appropriate.
upvoted 1 times
...
...
coolmenthol
1 year, 10 months ago
Selected Answer: A
See https://developers.google.com/machine-learning/data-prep/transform/normalization
upvoted 2 times
...
Antmal
1 year, 12 months ago
Selected Answer: A
A is a better option because Log transform data used when we want a heavily skewed feature to be transformed into a normal distribution as close as possible, because when you normalize data using Minimum Maximum scaler, It doesn't work well with many outliers and its prone to unexpected behaviours if values go out of the given range in the test set. It is a less popular alternative to scaling.
upvoted 2 times
tavva_prudhvi
1 year, 10 months ago
If your data is heavily skewed and has a significant number of outliers, log transformation (option A) might be a better choice. However, if your primary concern is to ensure that the features with the largest magnitudes don't overfit the model and the data does not have a significant skew or too many outliers, normalizing the data (option D) would be more appropriate.
upvoted 1 times
...
...
M25
2 years ago
Selected Answer: D
The challenge is the “scale” (significant variations in magnitude and spread): https://stats.stackexchange.com/questions/462380/does-data-normalization-reduce-over-fitting-when-training-a-model, apparently largely used anyhow: https://itadviser.dev/stock-market-data-normalization-for-time-series/.
upvoted 1 times
M25
2 years ago
“(…) some features have a large range”, possible presence of outliers exclude standardization [excluding A]: https://www.analyticsvidhya.com/blog/2020/04/feature-scaling-machine-learning-normalization-standardization/. “(…) a wide range of factors”, PCA transform the data so that it can be described with fewer dimensions / features: https://en.wikipedia.org/wiki/Principal_component_analysis, but [excluding B]: it asks to “ensure that the features with largest magnitude don’t overfit the model”.
upvoted 1 times
...
M25
2 years ago
Even if binning “prevents overfitting and increases the robustness of the model”: https://www.analyticsvidhya.com/blog/2020/10/getting-started-with-feature-engineering, the disadvantage is that information is lost, particularly on features sharper than the binning: https://www.kaggle.com/questions-and-answers/171942, and then you need to reasonably re-adjust the binning to spot the moving target “trends” [excluding C]: https://stats.stackexchange.com/questions/230750/when-should-we-discretize-bin-continuous-independent-variables-features-and-when.
upvoted 1 times
...
...
niketd
2 years, 1 month ago
Selected Answer: D
The question doesn't talk about the skewness within each feature. It talks about normalizing the effect of features with large range. So scaling each feature within (0,1) range will solve the problem
upvoted 1 times
...
JamesDoe
2 years, 1 month ago
Really need more info to answer this: what does "large range" mean? Distribution follows a power law --> use log(). Or are they more evenly/linearly distributed --> use (0,1) scaling.
upvoted 1 times
...
guilhermebutzke
2 years, 1 month ago
Selected Answer: C
I think C could be a better choice. Bucketizing the data we can fix the distribution problem by bins. in letter A, standardization by log could not be effective if the range of the data has negative and positive values. In letter D, definitely normalization does not resolve the skew problem. Data normalization assumes that data has some normal distribution. https://medium.com/analytics-vidhya/data-transformation-for-numeric-features-fb16757382c0
upvoted 3 times
...
TNT87
2 years, 2 months ago
Selected Answer: D
D. Normalize the data by scaling it to have values between 0 and 1. Standardization and normalization are common techniques to preprocess the data to be more suitable for machine learning models. Normalization scales the data to be within a specific range (commonly between 0 and 1 or -1 and 1), which can help prevent features with large magnitudes from dominating the model. This approach is especially useful when using models that are sensitive to the magnitude of features, such as distance-based models or neural networks.
upvoted 1 times
...
FherRO
2 years, 2 months ago
Selected Answer: A
https://developers.google.com/machine-learning/data-prep/transform/normalization#log-scaling
upvoted 1 times
...
shankalman717
2 years, 2 months ago
Selected Answer: D
he best approach to handle features with a large range in an ML model is to normalize the data by scaling it to have values between 0 and 1. Therefore, the correct answer is D. Normalization ensures that the features have similar scales, which is important for many machine learning algorithms. If some features have a larger magnitude than others, they can dominate the objective function and make the model unable to learn from other features correctly. By scaling all the features to a similar range, we can avoid this problem and make the objective function less sensitive to the scale of the input features. Standardizing the data by transforming it with a logarithmic function (option A) is not suitable for all types of data and may not always be effective in reducing the impact of features with a large range.
upvoted 1 times
...

Topic 1 Question 116

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 116 discussion

You work for a biotech startup that is experimenting with deep learning ML models based on properties of biological organisms. Your team frequently works on early-stage experiments with new architectures of ML models, and writes custom TensorFlow ops in C++. You train your models on large datasets and large batch sizes. Your typical batch size has 1024 examples, and each example is about 1 MB in size. The average size of a network with all weights and embeddings is 20 GB. What hardware should you choose for your models?

  • A. A cluster with 2 n1-highcpu-64 machines, each with 8 NVIDIA Tesla V100 GPUs (128 GB GPU memory in total), and a n1-highcpu-64 machine with 64 vCPUs and 58 GB RAM
  • B. A cluster with 2 a2-megagpu-16g machines, each with 16 NVIDIA Tesla A100 GPUs (640 GB GPU memory in total), 96 vCPUs, and 1.4 TB RAM
  • C. A cluster with an n1-highcpu-64 machine with a v2-8 TPU and 64 GB RAM
  • D. A cluster with 4 n1-highcpu-96 machines, each with 96 vCPUs and 86 GB RAM
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
aw_49
Highly Voted 2 years, 5 months ago
Selected Answer: D
D: use CPU when models that contain many custom TensorFlow operations written in C++ https://cloud.google.com/tpu/docs/intro-to-tpu#cpus
upvoted 8 times
...
Antmal
Highly Voted 2 years, 7 months ago
Selected Answer: B
The best hardware for your models would be a cluster with 2 a2-megagpu-16g machines, each with 16 NVIDIA Tesla A100 GPUs (640 GB GPU memory in total), 96 vCPUs, and 1.4 TB RAM. This hardware will give you the following benefits: High GPU memory: Each A100 GPU has 40 GB of memory, which is more than enough to store the weights and embeddings of your models. Large batch sizes: With 16 GPUs per machine, you can train your models with large batch sizes, which will improve training speed. Fast CPUs: The 96 vCPUs on each machine will provide the processing power you need to run your custom TensorFlow ops in C++. Adequate RAM: The 1.4 TB of RAM on each machine will ensure that your models have enough memory to train and run. The other options are not as suitable for your needs. Option A has less GPU memory, which will slow down training. Option B has more GPU memory, but it is also more expensive. Option C has a TPU, which is a good option for some deep learning tasks, but it is not as well-suited for your needs as a GPU cluster. Option D has more vCPUs and RAM, but it does not have enough GPU memory to train your models. Therefore, the best hardware for your models is a cluster with 2 a2-megagpu-16g machines.
upvoted 6 times
...
dija123
Most Recent 4 weeks, 1 day ago
Selected Answer: B
While it's a common piece of advice to use CPUs for TensorFlow models with many custom C++ operations, it's more of a practical guideline than a strict rule. The reason behind this recommendation boils down to ease of implementation and the nature of custom operations themselves. However, for performance-critical applications, especially those involving parallelizable computations, implementing GPU support for your custom ops is often the better long-term strategy.
upvoted 1 times
...
dija123
1 month, 1 week ago
Selected Answer: B
B is perfect fit for the question requirements
upvoted 1 times
...
rajshiv
11 months, 1 week ago
Selected Answer: B
I do not agree that D is correct. The option provides significant CPU resources, but it lacks GPU acceleration, which is necessary for efficiently training large deep learning models with large datasets. While CPUs can handle certain operations, they are generally much slower for training deep learning models compared to GPUs or TPUs. Choice B provides the best hardware for deep learning workload, offering 16 NVIDIA A100 GPUs with 640 GB of GPU memory, along with sufficient CPU and RAM resources to handle large datasets and complex model architectures.
upvoted 4 times
bc3f222
8 months ago
TensorFlow operations written in C++, so D
upvoted 1 times
denys2345
5 months, 2 weeks ago
It is good for GPU
upvoted 1 times
...
...
...
edoo
1 year, 8 months ago
Selected Answer: D
B looks like unleashing a rocket launcher to swat a fly ("early-stage experiments"). D is enough (c++).
upvoted 2 times
...
tavva_prudhvi
2 years, 3 months ago
While it is true that using CPUs can be more efficient when dealing with custom TensorFlow operations written in C++, it is important to consider the specific requirements of your models. In his case, we mentioned large batch sizes (1024 examples), large example sizes (1 MB each), and large network sizes (20 GB). 4 n1-highcpu-96 machines, each with 96 vCPUs and 86 GB RAM. While this configuration would provide a high number of vCPUs for custom TensorFlow operations, it lacks the GPU memory and overall RAM necessary to handle the large batch sizes and network sizes of your models.
upvoted 1 times
...
ciro_li
2 years, 3 months ago
B: https://cloud.google.com/tpu/docs/intro-to-tpu#cpus
upvoted 2 times
pinimichele01
1 year, 7 months ago
so D, not B...
upvoted 1 times
...
...
Voyager2
2 years, 5 months ago
Selected Answer: D
D: use CPU when models that contain many custom TensorFlow operations written in C++ https://cloud.google.com/tpu/docs/intro-to-tpu#cpus
upvoted 3 times
...
LoveExams
2 years, 5 months ago
Wouldn't all PC's work here? I could do this model on my own home PC just fine.
upvoted 3 times
...
M25
2 years, 6 months ago
Selected Answer: D
“writes custom TensorFlow ops in C++” -> use CPUs when “Models that contain many custom TensorFlow operations written in C++”: https://cloud.google.com/tpu/docs/intro-to-tpu#when_to_use_tpus
upvoted 2 times
...
TNT87
2 years, 8 months ago
Selected Answer: B
To determine the appropriate hardware for training the models, we need to calculate the required memory and processing power based on the size of the model and the size of the input data. Given that the batch size is 1024 and each example is 1 MB, the total size of each batch is 1024 * 1 MB = 1024 MB = 1 GB. Therefore, we need to load 1 GB of data into memory for each batch. The total size of the network is 20 GB, which means that it can fit in the memory of most modern GPUs.
upvoted 4 times
...
JeanEl
2 years, 9 months ago
Selected Answer: D
It's D
upvoted 1 times
JeanEl
2 years, 9 months ago
https://cloud.google.com/tpu/docs/tpus
upvoted 2 times
...
...
hiromi
2 years, 10 months ago
Selected Answer: D
D CPUs are recommended for TensorFlow ops written in C++ - https://cloud.google.com/tpu/docs/tensorflow-ops (Cloud TPU only supports Python)
upvoted 3 times
John_Pongthorn
2 years, 9 months ago
GPU can apply through C++ implement,but C rule out for sure.
upvoted 3 times
...
...

Topic 1 Question 117

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 117 discussion

You are an ML engineer at an ecommerce company and have been tasked with building a model that predicts how much inventory the logistics team should order each month. Which approach should you take?

  • A. Use a clustering algorithm to group popular items together. Give the list to the logistics team so they can increase inventory of the popular items.
  • B. Use a regression model to predict how much additional inventory should be purchased each month. Give the results to the logistics team at the beginning of the month so they can increase inventory by the amount predicted by the model.
  • C. Use a time series forecasting model to predict each item's monthly sales. Give the results to the logistics team so they can base inventory on the amount predicted by the model.
  • D. Use a classification model to classify inventory levels as UNDER_STOCKED, OVER_STOCKED, and CORRECTLY_STOCKEGive the report to the logistics team each month so they can fine-tune inventory levels.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
mil_spyro
Highly Voted 1 year, 5 months ago
Selected Answer: C
This type of model is well-suited to predicting inventory levels because it can take into account trends and patterns in the data over time, such as seasonal fluctuations in demand or changes in customer behavior.
upvoted 11 times
...
M25
Most Recent 1 year ago
Selected Answer: C
https://cloud.google.com/learn/what-is-time-series "For example, a large retail store may have millions of items to forecast so that inventory is available when demand is high, and not overstocked when demand is low."
upvoted 1 times
...
TNT87
1 year, 2 months ago
Selected Answer: C
Answer C
upvoted 1 times
...
JeanEl
1 year, 3 months ago
Selected Answer: C
Yup it's C (Time series forecasting)
upvoted 1 times
...
ares81
1 year, 4 months ago
Selected Answer: C
Time-series forecasting model is the key expression, for me.
upvoted 1 times
...
hiromi
1 year, 4 months ago
Selected Answer: C
C (by experience) Use a time series forecasting model to predict each item's monthly sales. Give the results to the logistics team so they can base inventory on the amount predicted by the model.
upvoted 3 times
...

Topic 1 Question 118

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 118 discussion

You are building a TensorFlow model for a financial institution that predicts the impact of consumer spending on inflation globally. Due to the size and nature of the data, your model is long-running across all types of hardware, and you have built frequent checkpointing into the training process. Your organization has asked you to minimize cost. What hardware should you choose?

  • A. A Vertex AI Workbench user-managed notebooks instance running on an n1-standard-16 with 4 NVIDIA P100 GPUs
  • B. A Vertex AI Workbench user-managed notebooks instance running on an n1-standard-16 with an NVIDIA P100 GPU
  • C. A Vertex AI Workbench user-managed notebooks instance running on an n1-standard-16 with a non-preemptible v3-8 TPU
  • D. A Vertex AI Workbench user-managed notebooks instance running on an n1-standard-16 with a preemptible v3-8 TPU
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
hiromi
Highly Voted 2 years, 10 months ago
Selected Answer: D
D you have built frequent checkpointing into the training process / minimize cost -> preemptible
upvoted 9 times
...
5a74493
Most Recent 1 year, 2 months ago
Selected Answer: C
For financial institutions, reliability and minimizing interruptions are crucial. While preemptible instances are cost-effective, they do come with the risk of being terminated unexpectedly, which might not be ideal for critical financial applications.
upvoted 1 times
...
M25
2 years, 6 months ago
Selected Answer: D
Follows same principle as #70
upvoted 2 times
...
Antmal
2 years, 7 months ago
Selected Answer: D
Preemptible v3-8 TPUs are the most cost-effective option for training large TensorFlow models. They are up to 80% cheaper than non-preemptible v3-8 TPUs, and they are only preempted if Google Cloud needs the resources for other workloads. In this case, the model is long-running and checkpointing is used. This means that the training process can be interrupted and resumed without losing any progress. Therefore, preemptible TPUs are a safe choice, as the training process will not be interrupted if the TPU is preempted. The other options are not as cost-effective.
upvoted 2 times
...
TNT87
2 years, 8 months ago
Selected Answer: D
Answer D
upvoted 1 times
...
ares81
2 years, 10 months ago
Selected Answer: D
Frequent checkpoints --> Preemptible --> D
upvoted 4 times
...
mymy9418
2 years, 10 months ago
Selected Answer: D
preemptible is the keyword to me
upvoted 1 times
...

Topic 1 Question 119

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 119 discussion

You work for a company that provides an anti-spam service that flags and hides spam posts on social media platforms. Your company currently uses a list of 200,000 keywords to identify suspected spam posts. If a post contains more than a few of these keywords, the post is identified as spam. You want to start using machine learning to flag spam posts for human review. What is the main advantage of implementing machine learning for this business case?

  • A. Posts can be compared to the keyword list much more quickly.
  • B. New problematic phrases can be identified in spam posts.
  • C. A much longer keyword list can be used to flag spam posts.
  • D. Spam posts can be flagged using far fewer keywords.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
mil_spyro
Highly Voted 1 year, 5 months ago
Selected Answer: B
I vote B. Machine learning algorithms can learn to identify spam posts based on a wider range of factors, such as the content of the post, the user's behavior, and the context in which the post appears.
upvoted 9 times
...
M25
Most Recent 1 year ago
Selected Answer: B
https://cloud.google.com/blog/topics/developers-practitioners/how-spam-detection-taught-us-better-tech-support "Borrowing spam tech (...) Those engineers had thought through “how do we detect a new spam campaign quickly?” Spammers rapidly send bulk messages with slight variations in content (noise, misspellings, etc.) Most classification attempts would become a game of cat and mouse since it takes classifiers some time to learn about new patterns. Invoking a trend identification engine using unsupervised density clustering on unstructured text unlocked the ability for Gmail to detect ephemeral spam campaigns more quickly."
upvoted 1 times
...
TNT87
1 year, 2 months ago
Selected Answer: B
Answer B
upvoted 1 times
...
ares81
1 year, 4 months ago
Selected Answer: B
B screams machine learning with every letter.
upvoted 3 times
...
hiromi
1 year, 4 months ago
Selected Answer: B
B make sense & I agree with mill_sypro
upvoted 2 times
...

Topic 1 Question 120

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 120 discussion

One of your models is trained using data provided by a third-party data broker. The data broker does not reliably notify you of formatting changes in the data. You want to make your model training pipeline more robust to issues like this. What should you do?

  • A. Use TensorFlow Data Validation to detect and flag schema anomalies.
  • B. Use TensorFlow Transform to create a preprocessing component that will normalize data to the expected distribution, and replace values that don’t match the schema with 0.
  • C. Use tf.math to analyze the data, compute summary statistics, and flag statistical anomalies.
  • D. Use custom TensorFlow functions at the start of your model training to detect and flag known formatting errors.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
mil_spyro
Highly Voted 2 years, 11 months ago
Selected Answer: A
TensorFlow Data Validation (TFDV) is a library that can help you detect and flag anomalies in your dataset, such as changes in the schema or data types. https://www.tensorflow.org/tfx/data_validation/get_started
upvoted 5 times
...
5a74493
Most Recent 1 year, 2 months ago
Selected Answer: A
i would choose A and B because For the model to be truly robust, it needs to adapt to new formats, not just detect and flag anomalies. In this case, combining detection with adaptive preprocessing would be the best approach
upvoted 2 times
...
M25
2 years, 6 months ago
Selected Answer: A
Went with A
upvoted 1 times
...
Yajnas_arpohc
2 years, 7 months ago
Selected Answer: A
You need to know problem b4 fixing w transform, hence A
upvoted 2 times
...
TNT87
2 years, 8 months ago
Selected Answer: A
Answer A
upvoted 1 times
...
John_Pongthorn
2 years, 9 months ago
Selected Answer: A
https://www.tensorflow.org/tfx/guide/tfdv#schema_based_example_validation
upvoted 1 times
...
ares81
2 years, 10 months ago
Selected Answer: A
Tensorflow Data Validation (TFDV) can analyze training and serving data to: compute descriptive statistics, infer a schema, detect data anomalies. A.
upvoted 1 times
...
hiromi
2 years, 10 months ago
Selected Answer: A
A - https://www.tensorflow.org/tfx/data_validation/get_started
upvoted 3 times
...

Topic 1 Question 121

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 121 discussion

You work for a company that is developing a new video streaming platform. You have been asked to create a recommendation system that will suggest the next video for a user to watch. After a review by an AI Ethics team, you are approved to start development. Each video asset in your company’s catalog has useful metadata (e.g., content type, release date, country), but you do not have any historical user event data. How should you build the recommendation system for the first version of the product?

  • A. Launch the product without machine learning. Present videos to users alphabetically, and start collecting user event data so you can develop a recommender model in the future.
  • B. Launch the product without machine learning. Use simple heuristics based on content metadata to recommend similar videos to users, and start collecting user event data so you can develop a recommender model in the future.
  • C. Launch the product with machine learning. Use a publicly available dataset such as MovieLens to train a model using the Recommendations AI, and then apply this trained model to your data.
  • D. Launch the product with machine learning. Generate embeddings for each video by training an autoencoder on the content metadata using TensorFlow. Cluster content based on the similarity of these embeddings, and then recommend videos from the same cluster.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
tavva_prudhvi
Highly Voted 2 years, 4 months ago
Selected Answer: B
This is because you do not have any historical user event data, so you cannot use a collaborative filtering approach to build a recommender system. However, you can still use simple heuristics based on content metadata to recommend similar videos to users. For example, you could recommend videos that are in the same genre, have the same release date, or are from the same country. You should also start collecting user event data as soon as possible. This data will be valuable for training a recommender model in the future. Option D is a more complex approach that would require you to have more expertise in machine learning.(FOR THE FIRST VERSION OF THE PRODUCT)
upvoted 5 times
...
jamesking1103
Highly Voted 2 years, 10 months ago
Selected Answer: D
https://developers.google.com/machine-learning/crash-course/embeddings/categorical-input-data
upvoted 5 times
...
Pau1234
Most Recent 11 months, 1 week ago
Selected Answer: B
When no data: recommend similar videos to the users. D is too complex.
upvoted 3 times
...
LFavero
1 year, 8 months ago
Selected Answer: B
D is overkill
upvoted 1 times
...
guilhermebutzke
1 year, 9 months ago
Selected Answer: B
My choice is B. This is because both B and D have the same goal (recommendation based on content), but option B is simpler for this initial context.
upvoted 2 times
...
Voyager2
2 years, 5 months ago
Selected Answer: B
B: https://developers.google.com/machine-learning/guides/rules-of-ml
upvoted 3 times
...
aw_49
2 years, 5 months ago
Selected Answer: B
B since we can't use othe encoded data to test on some other system
upvoted 1 times
...
M25
2 years, 6 months ago
Selected Answer: D
Went with D
upvoted 3 times
...
Sas02
2 years, 6 months ago
Selected Answer: D
https://cloud.google.com/blog/topics/developers-practitioners/meet-ais-multitool-vector-embeddings Option D is about creating clusters based on the content metadata and using that to provide recos to users
upvoted 2 times
...
Yajnas_arpohc
2 years, 7 months ago
Key is the mention “first version of product”
upvoted 1 times
...
guilhermebutzke
2 years, 8 months ago
Selected Answer: D
It is possible to create a recommendation system just using metadata information, like in: https://developers.google.com/machine-learning/crash-course/embeddings/categorical-input-data One of the initial problems of recommender systems is precisely the lack of data for collaborative recommendation. However, this does not prevent other recommendation algorithms, for example, those that use content suggestion.
upvoted 3 times
guilhermebutzke
1 year, 9 months ago
Change my mind: My choice is B. This is because both B and D have the same goal (recommendation based on content), but option B is simpler for this initial context.
upvoted 1 times
...
...
TNT87
2 years, 8 months ago
Selected Answer: B
Since you do not have any historical user event data, options C and D are not suitable. In this scenario, it is better to start with a simpler approach, so options A and B are the most suitable. However, option B is preferred because it uses some logic based on content metadata to provide recommendations, which may be more personalized and relevant than presenting videos in alphabetical order. Additionally, collecting user event data from the beginning will help improve the recommendation system in the future.
upvoted 2 times
...
enghabeth
2 years, 9 months ago
Selected Answer: B
ans B, you need something easier to implement
upvoted 1 times
...
hiromi
2 years, 10 months ago
Selected Answer: B
B - https://developers.google.com/machine-learning/guides/rules-of-ml
upvoted 3 times
...
mymy9418
2 years, 10 months ago
Selected Answer: B
because no user event data, the pretrain model won't help
upvoted 2 times
...

Topic 1 Question 122

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 122 discussion

You recently built the first version of an image segmentation model for a self-driving car. After deploying the model, you observe a decrease in the area under the curve (AUC) metric. When analyzing the video recordings, you also discover that the model fails in highly congested traffic but works as expected when there is less traffic. What is the most likely reason for this result?

  • A. The model is overfitting in areas with less traffic and underfitting in areas with more traffic.
  • B. AUC is not the correct metric to evaluate this classification model.
  • C. Too much data representing congested areas was used for model training.
  • D. Gradients become small and vanish while backpropagating from the output to input nodes.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
fitri001
Highly Voted 1 year, 6 months ago
Selected Answer: A
Overfitting and Underfitting: Overfitting describes a model that performs well on the training data but struggles with unseen data. Underfitting signifies a model that hasn't learned enough patterns from the training data. Congested Traffic as Unseen Data: If the training data primarily consisted of scenarios with less traffic, the model might not have been exposed to enough examples of congested situations. This would make congested traffic act like unseen data, leading to underfitting and poor performance.
upvoted 5 times
...
hybridpro
Most Recent 1 year, 1 month ago
Selected Answer: B
AOC makes sense in a binary classification problem, but that's not the case here. That is the biggest red flag right there in the question.
upvoted 1 times
...
andresvelasco
2 years, 2 months ago
Selected Answer: B
very tricky, here is my view: A. The model is overfitting in areas with less traffic and underfitting in areas with more traffic. Most Voted > I dont think so because: "the model fails in highly congested traffic but works as expected when there is less traffic" which means it is NOT OVERFITTING with less traffic. Actually the contrary would make more sense. B. AUC is not the correct metric to evaluate this classification model. > My option. Image Segmentation is about ditinguishing objects, not sure AUC is right for this. C. Too much data representing congested areas was used for model training. > Cant be. THat would actually make it perform at least as good. D. Gradients become small and vanish while backpropagating from the output to input nodes. > No clue.
upvoted 2 times
tavva_prudhvi
2 years ago
Overfitting occurs when a model learns the details and noise in the training data to the extent that it negatively impacts the performance of the model on new data. If your training data included more examples of less congested areas, the model might have overfitted to these scenarios and, as a result, performs poorly in unrepresented or underrepresented situations, such as heavy traffic. AUC (Area Under the Curve) is a widely used metric for evaluating the performance of classification models. However, it might not be the sole or most appropriate metric for a complex task like image segmentation in self-driving cars. Other metrics like Intersection over Union (IoU) or pixel accuracy might be more relevant for evaluating segmentation tasks. Still, this doesn't explain the model's performance drop in different traffic conditions.
upvoted 2 times
...
...
PST21
2 years, 3 months ago
Selected Answer: D
D. Gradients become small and vanish while backpropagating from the output to input nodes. This issue is known as the vanishing gradient problem, which can occur during the training of deep neural networks. In highly congested traffic scenes, there might be complex patterns and details that the image segmentation model needs to capture. However, if the model architecture is too deep and the gradients become very small during backpropagation, the model may struggle to update its weights effectively to learn these complex patterns. As a result, the model may fail to correctly segment objects in congested traffic scenes, leading to a decrease in performance. Vanishing gradients can prevent the model from effectively learning representations and features in the deeper layers of the network. It's possible that the model is working fine in less congested areas because the patterns are simpler and easier to learn, allowing the gradients to propagate more effectively.
upvoted 2 times
maukaba
2 years ago
There's a paper saying that: https://www.sciencedirect.com/science/article/abs/pii/S0925231218313821
upvoted 1 times
...
andresvelasco
2 years, 2 months ago
mmm, possibly
upvoted 1 times
andresvelasco
2 years, 2 months ago
dont think so because: "the model fails in highly congested traffic but works as expected when there is less traffic" which means it is NOT OVERFITTING with less traffic. Actually the contrary would make more sense.
upvoted 1 times
...
...
...
M25
2 years, 6 months ago
Selected Answer: A
Went with A
upvoted 1 times
...
Antmal
2 years, 6 months ago
Selected Answer: A
The most likely reason for this result is the model is overfitting in areas with less traffic and underfitting in areas with more traffic. Probably because the model was trained on a dataset that did not have enough examples of congested traffic. As a result, the model is not able to generalise well. When the model is validated on congested traffic, it makes mistakes because it has not seen this type of data before.
upvoted 1 times
andresvelasco
2 years, 2 months ago
dont think so because: "the model fails in highly congested traffic but works as expected when there is less traffic" which means it is NOT OVERFITTING with less traffic. Actually the contrary would make more sense.
upvoted 1 times
...
...
TNT87
2 years, 8 months ago
Selected Answer: A
Answer A
upvoted 1 times
andresvelasco
2 years, 2 months ago
dont think so because: "the model fails in highly congested traffic but works as expected when there is less traffic" which means it is NOT OVERFITTING with less traffic. Actually the contrary would make more sense.
upvoted 2 times
...
...
enghabeth
2 years, 9 months ago
Selected Answer: A
the model was trained with bias
upvoted 1 times
...
hiromi
2 years, 10 months ago
Selected Answer: A
A It's an example of overfitting/underfitting problem
upvoted 3 times
...
mil_spyro
2 years, 11 months ago
Selected Answer: A
I vote A, it is likely that the model was trained on data that included mostly images of less congested traffic, and therefore did not generalize well to images of more congested traffic.
upvoted 4 times
...

Topic 1 Question 123

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 123 discussion

You are developing an ML model to predict house prices. While preparing the data, you discover that an important predictor variable, distance from the closest school, is often missing and does not have high variance. Every instance (row) in your data is important. How should you handle the missing data?

  • A. Delete the rows that have missing values.
  • B. Apply feature crossing with another column that does not have missing values.
  • C. Predict the missing values using linear regression.
  • D. Replace the missing values with zeros.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
fitri001
1 year ago
Selected Answer: C
Preserves Information: Deleting rows (Option A) throws away valuable data, especially since every instance is important. Not Applicable Technique: Feature crossing (Option B) creates new features by multiplying existing features. It wouldn't address missing values directly. Zero Imputation Might Bias: Replacing missing values with zeros (Option D) can introduce bias if zeros have a specific meaning in the data (e.g., distance cannot be zero).
upvoted 4 times
...
Voyager2
1 year, 11 months ago
Went with A: Predict the missing values using linear regression as the data does not have high variance.
upvoted 4 times
...
M25
2 years ago
Selected Answer: C
Went with C
upvoted 1 times
...
TNT87
2 years, 2 months ago
Selected Answer: C
Answer is C Predicting the missing values using linear regression can be a good approach, especially if the variable is important for the prediction. The values can be imputed using regression, where the missing variable can be the dependent variable, and other relevant variables can be used as predictors
upvoted 2 times
...
John_Pongthorn
2 years, 3 months ago
Selected Answer: C
Regression https://cran.r-project.org/web/packages/miceRanger/vignettes/miceAlgorithm.html • Find linear or non-linear relationships between the missing feature and other features • Most advanced technique: MICE (Multiple Imputation by Chained Equations)
upvoted 1 times
...
ares81
2 years, 4 months ago
Selected Answer: C
It's C.
upvoted 1 times
...
daran
2 years, 4 months ago
My answer was based on the below article https://towardsdatascience.com/7-ways-to-handle-missing-values-in-machine-learning-1a6326adf79e
upvoted 1 times
...
daran
2 years, 4 months ago
One of the ways to handle missing data is deleting the rows. but question here says that every row is important. so I think another possible option could be to predict the missing value. Option C could be correct !
upvoted 2 times
...
hiromi
2 years, 4 months ago
Selected Answer: C
C (not sure)
upvoted 2 times
...
pshemol
2 years, 4 months ago
Selected Answer: C
A no - Every row is important B no - product of other feature values with no values makes no sense to me D no - zero value would bias the model as zero distance from school has the highest value to model C yes - there is an approach using linear regression to predict missing values
upvoted 2 times
...

Topic 1 Question 124

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 124 discussion

You are an ML engineer responsible for designing and implementing training pipelines for ML models. You need to create an end-to-end training pipeline for a TensorFlow model. The TensorFlow model will be trained on several terabytes of structured data. You need the pipeline to include data quality checks before training and model quality checks after training but prior to deployment. You want to minimize development time and the need for infrastructure maintenance. How should you build and orchestrate your training pipeline?

  • A. Create the pipeline using Kubeflow Pipelines domain-specific language (DSL) and predefined Google Cloud components. Orchestrate the pipeline using Vertex AI Pipelines.
  • B. Create the pipeline using TensorFlow Extended (TFX) and standard TFX components. Orchestrate the pipeline using Vertex AI Pipelines.
  • C. Create the pipeline using Kubeflow Pipelines domain-specific language (DSL) and predefined Google Cloud components. Orchestrate the pipeline using Kubeflow Pipelines deployed on Google Kubernetes Engine.
  • D. Create the pipeline using TensorFlow Extended (TFX) and standard TFX components. Orchestrate the pipeline using Kubeflow Pipelines deployed on Google Kubernetes Engine.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
fitri001
Highly Voted 1 year ago
Selected Answer: B
TFX for TensorFlow Models: TensorFlow Extended (TFX) is an end-to-end machine learning platform built on top of TensorFlow. It provides a set of pre-built components specifically designed for TensorFlow models, simplifying development and ensuring compatibility. Vertex AI Pipelines for Orchestration: Vertex AI Pipelines, a managed service from Google Cloud, is ideal for orchestrating ML pipelines. It integrates seamlessly with TFX and provides features like monitoring, scheduling, and scaling, reducing infrastructure maintenance needs.
upvoted 5 times
fitri001
1 year ago
A. Kubeflow Pipelines with Predefined Components: While Kubeflow Pipelines offer a DSL for building pipelines, using standard TFX components within Vertex AI Pipelines offers a more streamlined solution designed for TensorFlow models. C & D. Kubeflow Pipelines with Manual Deployment: Both options involve using Kubeflow Pipelines, but deploying it on Google Kubernetes Engine requires additional infrastructure management compared to using the managed service, Vertex AI Pipelines.
upvoted 5 times
...
...
NickHapton
Most Recent 1 year, 10 months ago
B. why not C? as the question content mentioned, this model is built by tensorflow
upvoted 3 times
...
SamuelTsch
1 year, 10 months ago
Selected Answer: B
B should be correct
upvoted 2 times
...
M25
2 years ago
Selected Answer: B
Went with B
upvoted 3 times
...
JamesDoe
2 years, 1 month ago
Selected Answer: B
B. Straight from the docs: https://cloud.google.com/vertex-ai/docs/pipelines/build-pipeline#sdk
upvoted 4 times
...
TNT87
2 years, 2 months ago
Selected Answer: B
B. Create the pipeline using TensorFlow Extended (TFX) and standard TFX components. Orchestrate the pipeline using Vertex AI Pipelines. TFX provides a set of standard components for building end-to-end ML pipelines, including data validation and model analysis. Vertex AI Pipelines is a fully managed service for building and orchestrating machine learning pipelines on Google Cloud.
upvoted 2 times
...
ares81
2 years, 4 months ago
Selected Answer: B
It's B!
upvoted 2 times
...
egdiaa
2 years, 4 months ago
Selected Answer: B
Reference: https://www.tensorflow.org/tfx/guide/tfdv
upvoted 3 times
...
hiromi
2 years, 4 months ago
Selected Answer: B
B (not sure)
upvoted 3 times
...

Topic 1 Question 125

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 125 discussion

You manage a team of data scientists who use a cloud-based backend system to submit training jobs. This system has become very difficult to administer, and you want to use a managed service instead. The data scientists you work with use many different frameworks, including Keras, PyTorch, theano, scikit-learn, and custom libraries. What should you do?

  • A. Use the Vertex AI Training to submit training jobs using any framework.
  • B. Configure Kubeflow to run on Google Kubernetes Engine and submit training jobs through TFJob.
  • C. Create a library of VM images on Compute Engine, and publish these images on a centralized repository.
  • D. Set up Slurm workload manager to receive jobs that can be scheduled to run on your cloud infrastructure.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
fitri001
Highly Voted 1 year ago
Selected Answer: A
Managed Service: It eliminates the need to administer a complex backend system, reducing your team's workload. Framework Agnostic: Vertex AI Training supports various frameworks like Keras, PyTorch, scikit-learn, and custom libraries, aligning with your data scientists' needs.
upvoted 5 times
fitri001
1 year ago
B. Kubeflow on GKE with TFJob: While Kubeflow offers framework flexibility, setting it up and managing it on Google Kubernetes Engine (GKE) adds complexity compared to a fully managed service like Vertex AI Training. C. VM Image Library: Creating and maintaining a library of VM images for every framework is cumbersome and doesn't scale well for various frameworks and custom libraries. D. Slurm Workload Manager: Slurm is a workload manager, not a training service. It wouldn't directly address the need for framework-agnostic training job submission.
upvoted 3 times
...
...
AntoGrd
Most Recent 1 year, 2 months ago
Selected Answer: A
Answer A
upvoted 2 times
...
SamuelTsch
1 year, 10 months ago
Selected Answer: A
replicated # 5
upvoted 2 times
...
Voyager2
1 year, 11 months ago
Went with A. Use the Vertex AI Training to submit training jobs using any framework. As the request states managed service I will discard Compute Engine and Kubernetes. I discarded D since Google is not in the picture
upvoted 2 times
...
TNT87
2 years, 2 months ago
Selected Answer: A
Answer A
upvoted 1 times
...
hiromi
2 years, 4 months ago
Selected Answer: A
A (similar question 5)
upvoted 3 times
...
mymy9418
2 years, 4 months ago
Selected Answer: A
https://www.examtopics.com/discussions/google/view/54653-exam-professional-machine-learning-engineer-topic-1-question/
upvoted 2 times
...

Topic 1 Question 126

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 126 discussion

You are training an object detection model using a Cloud TPU v2. Training time is taking longer than expected. Based on this simplified trace obtained with a Cloud TPU profile, what action should you take to decrease training time in a cost-efficient way?

  • A. Move from Cloud TPU v2 to Cloud TPU v3 and increase batch size.
  • B. Move from Cloud TPU v2 to 8 NVIDIA V100 GPUs and increase batch size.
  • C. Rewrite your input function to resize and reshape the input images.
  • D. Rewrite your input function using parallel reads, parallel processing, and prefetch.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
pshemol
Highly Voted 2 years, 4 months ago
Selected Answer: D
parallel reads, parallel processing, and prefetch is needed here
upvoted 8 times
...
fitri001
Most Recent 1 year ago
Selected Answer: D
Optimizing the data pipeline with parallel reads, processing, and prefetching can significantly improve training speed on TPUs by reducing I/O wait times. This approach utilizes the TPU's capabilities more effectively and avoids extra costs associated with hardware upgrades.
upvoted 4 times
fitri001
1 year ago
A. Moving to a different TPU version (v3) and increasing the batch size might improve training speed, but it's an expensive solution without a guarantee of the most efficient outcome. B. Switching to GPUs (V100) also increases costs and may not be optimized for your specific workload.
upvoted 2 times
fitri001
1 year ago
(C) can be part of the preprocessing step, but it likely won't address the core issue if the bottleneck is related to how data is being fed into the training process.
upvoted 2 times
...
...
...
M25
2 years ago
Selected Answer: D
Went with D
upvoted 1 times
...
TNT87
2 years, 2 months ago
Selected Answer: D
Based on the profile, it appears that the Compute time is relatively low compared to the HostToDevice and DeviceToHost time. This suggests that the data transfer between the host (CPU) and the TPU device is a bottleneck. Therefore, the best action to decrease training time in a cost-efficient way would be to reduce the amount of data transferred between the host and the device.
upvoted 2 times
...
hiromi
2 years, 4 months ago
Selected Answer: D
D - https://www.tensorflow.org/guide/data_performance
upvoted 4 times
...
mymy9418
2 years, 4 months ago
Selected Answer: D
i didn't see v3 has any benefit than v2 https://cloud.google.com/tpu/docs/system-architecture-tpu-vm#performance_benefits_of_tpu_v3_over_v2
upvoted 1 times
...

Topic 1 Question 127

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 127 discussion

While performing exploratory data analysis on a dataset, you find that an important categorical feature has 5% null values. You want to minimize the bias that could result from the missing values. How should you handle the missing values?

  • A. Remove the rows with missing values, and upsample your dataset by 5%.
  • B. Replace the missing values with the feature’s mean.
  • C. Replace the missing values with a placeholder category indicating a missing value.
  • D. Move the rows with missing values to your validation dataset.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
fitri001
1 year ago
Selected Answer: C
Minimizes Bias: Removing rows (A) with missing data can introduce bias if the missingness is not random.expand_more Upsampling the remaining data (A) might not address the underlying cause of missing values. Unsuitable for Categorical Features: Replacing with the mean (B) only works for numerical features. Transparency and Model Interpretation: A placeholder category (C) explicitly acknowledges the missing data and avoids introducing assumptions during model training. It also improves model interpretability. Validation Set Contamination (D): Moving rows with missing values to the validation set (D) contaminates the validation data and hinders its ability to assess model performance on unseen data. Using a placeholder category creates a separate category for missing values, allowing the model to handle them explicitly. This approach is particularly suitable for categorical features with a relatively small percentage of missing values (like 5% in this case).
upvoted 4 times
pinimichele01
1 year ago
if B nominate mode instead of mean?
upvoted 1 times
...
...
M25
2 years ago
Selected Answer: C
http://webcache.googleusercontent.com/search?q=cache:FzNjYfqNEZ0J:https://towardsdatascience.com/missing-values-dont-drop-them-f01b1d8ff557&hl=de&gl=de&strip=1&vwsrc=0 See also #62, #123
upvoted 1 times
M25
2 years ago
Also, tab "Forecasting": "For forecasting models, null values are imputed from the surrounding data. (There is no option to leave a null value as null.) If you would prefer to control the way null values are imputed, you can impute them explicitly. The best values to use might depend on your data and your business problem. Missing rows (for example, no row for a specific date, with a data granularity of daily) are allowed, but Vertex AI does not impute values for the missing data. Because missing rows can decrease model quality, you should avoid missing rows where possible. For example, if a row is missing because sales quantity for that day was zero, add a row for that day and explicitly set sales data to 0." https://cloud.google.com/vertex-ai/docs/datasets/data-types-tabular#null-values
upvoted 1 times
...
...
TNT87
2 years, 2 months ago
Selected Answer: C
C. Replace the missing values with a placeholder category indicating a missing value. This approach is often referred to as "imputing" missing values, and it is a common technique for dealing with missing data in categorical features. By using a placeholder category, you explicitly indicate that the value is missing, rather than assuming that the missing value is a particular category. This can help to minimize bias in downstream analyses, as it does not introduce any assumptions about the missing data that could bias your results.
upvoted 3 times
...
shankalman717
2 years, 2 months ago
Selected Answer: C
When handling missing values in a categorical feature, replacing the missing values with a placeholder category indicating a missing value, as described in option C, is the most appropriate solution in order to minimize bias that could result from the missing values. This approach allows the algorithm to treat missing values as a separate category, avoiding the risk of any assumptions being made about the missing values. Option A, removing the rows with missing values and upsampling the dataset by 5%, can lead to a loss of valuable data and can also introduce bias into the data. This approach can lead to overrepresentation of certain classes and underrepresentation of others. Option B, replacing the missing values with the feature's mean, is not appropriate for categorical features as there is no meaningful average value for categorical features. Option D, moving the rows with missing values to the validation dataset, is not a good solution. This approach may introduce bias into the validation dataset and can lead to overfitting.
upvoted 3 times
...
ailiba
2 years, 2 months ago
I am not really understanding the concept of C. What information should the model learn from that missing value category?
upvoted 1 times
...
jdeix
2 years, 3 months ago
If you want to minimize the bias, why do not you use mean?
upvoted 2 times
rayban3981
2 years, 3 months ago
It is categorical field, you can replace with median or mode not with mean
upvoted 2 times
...
...
ares81
2 years, 4 months ago
Selected Answer: C
C, for me.
upvoted 1 times
...
hargur
2 years, 4 months ago
C looks correct. We should replace the values with the a placeholder
upvoted 2 times
...
hiromi
2 years, 4 months ago
Selected Answer: C
C (not sure)
upvoted 1 times
...

Topic 1 Question 128

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 128 discussion

You are an ML engineer on an agricultural research team working on a crop disease detection tool to detect leaf rust spots in images of crops to determine the presence of a disease. These spots, which can vary in shape and size, are correlated to the severity of the disease. You want to develop a solution that predicts the presence and severity of the disease with high accuracy. What should you do?

  • A. Create an object detection model that can localize the rust spots.
  • B. Develop an image segmentation ML model to locate the boundaries of the rust spots.
  • C. Develop a template matching algorithm using traditional computer vision libraries.
  • D. Develop an image classification ML model to predict the presence of the disease.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Nayak8
Highly Voted 2 years, 4 months ago
Selected Answer: B
Not D because Classification can't predict the severity for that we need Segmentation
upvoted 8 times
...
fitri001
Most Recent 1 year ago
Selected Answer: B
Rust Spot Location and Size: Object detection (A) primarily focuses on identifying and bounding the location of objects.expand_more While it can detect the presence of rust spots, it wouldn't capture the variations in size and shape that correlate with disease severity. Detailed Boundaries: Image classification (D) would only predict the presence or absence of the disease based on the entire image. It wouldn't provide details about the location or extent of the rust spots. Template matching (C) with traditional libraries might be computationally expensive and struggle with the variability in spot shapes and sizes.
upvoted 3 times
...
julliet
1 year, 11 months ago
Selected Answer: B
only B gets the severity here
upvoted 2 times
...
M25
2 years ago
Selected Answer: B
Object Detection [Option A] and Image Segmentation [Option B]: https://www.oreilly.com/library/view/practical-machine-learning/9781098102357/ch04.html Image Recognition [Option D]: https://www.oreilly.com/library/view/practical-machine-learning/9781098102357/ch03.html#image_vision
upvoted 2 times
...
TNT87
2 years, 2 months ago
Selected Answer: B
B. Develop an image segmentation ML model to locate the boundaries of the rust spots. An image segmentation model is well-suited for this task because it can identify the exact location and shape of the rust spots in the image, which is critical for determining the severity of the disease. Once the rust spots have been identified, other algorithms can be used to analyze the data and predict the severity of the disease. Object detection models are another option, but they may not be as accurate as image segmentation models when it comes to identifying the exact boundaries of the rust spots. Template matching algorithms using traditional computer vision libraries are generally not as accurate as ML models when it comes to image analysis.
upvoted 2 times
andresvelasco
1 year, 8 months ago
Your reasoning is correct, but the headline says: "crop disease detection tool to detect leaf rust spots in images of crops to determine the presence of a disease". So I understand that from the output after processing the images is Disease/No Disease. Which I guess could be achieved with Classification.
upvoted 2 times
LFavero
1 year, 2 months ago
"You want to develop a solution that predicts the presence and severity of the disease with high accuracy." Therefore, detection only is not the best solution as it is not reliable as to the size of the detected object (rust spot)
upvoted 1 times
...
...
...
q2ng
2 years, 4 months ago
Selected Answer: B
the shape of the spot is quite important for the severity of the disease, and image segmentation could help us to determine it in a more granular manner. And it is often used in the healthcare industry, for getting the shapes of all the cancerous cells
upvoted 3 times
...
Abhijat
2 years, 4 months ago
Selected Answer: B
Answer B
upvoted 2 times
...
Dataspire
2 years, 4 months ago
Selected Answer: B
To determine severity of the disease, boundary of rust spots should be determined - for size/ shape etc.
upvoted 4 times
...
hiromi
2 years, 4 months ago
Selected Answer: D
D should works
upvoted 3 times
...
MithunDesai
2 years, 4 months ago
Selected Answer: D
I think D
upvoted 1 times
...

Topic 1 Question 129

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 129 discussion

You have been asked to productionize a proof-of-concept ML model built using Keras. The model was trained in a Jupyter notebook on a data scientist’s local machine. The notebook contains a cell that performs data validation and a cell that performs model analysis. You need to orchestrate the steps contained in the notebook and automate the execution of these steps for weekly retraining. You expect much more training data in the future. You want your solution to take advantage of managed services while minimizing cost. What should you do?

  • A. Move the Jupyter notebook to a Notebooks instance on the largest N2 machine type, and schedule the execution of the steps in the Notebooks instance using Cloud Scheduler.
  • B. Write the code as a TensorFlow Extended (TFX) pipeline orchestrated with Vertex AI Pipelines. Use standard TFX components for data validation and model analysis, and use Vertex AI Pipelines for model retraining.
  • C. Rewrite the steps in the Jupyter notebook as an Apache Spark job, and schedule the execution of the job on ephemeral Dataproc clusters using Cloud Scheduler.
  • D. Extract the steps contained in the Jupyter notebook as Python scripts, wrap each script in an Apache Airflow BashOperator, and run the resulting directed acyclic graph (DAG) in Cloud Composer.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Antmal
Highly Voted 1 year ago
Selected Answer: B
I believe it B. Write the code as a TensorFlow Extended (TFX) pipeline orchestrated with Vertex AI Pipelines. Use standard TFX components for data validation and model analysis, and use Vertex AI Pipelines for model retraining. Because : - Solution A is not scalable and will be expensive to run. It also does not take advantage of managed services. Solution C is more scalable than option A, but it is still not as scalable as using TFX and Vertex AI Pipelines. It also does not take advantage of managed services. - Solution D is the most flexible, but it is also the most complex. It requires more knowledge of Apache Airflow and is more difficult to manage. Overall, the best solution to productionize the proof-of-concept ML model is to use TFX and Vertex AI Pipelines. This solution is scalable, reliable, and easy to manage. It also takes advantage of managed services, which can help to reduce costs.
upvoted 5 times
...
Fer660
Most Recent 2 months, 2 weeks ago
Selected Answer: B
Only B avoids an extra network hop, and network hops can be expensive. It is possible to cache the model within DataFlow in order to get a single-record inference quickly.
upvoted 1 times
...
M25
1 year ago
Selected Answer: B
Went with B
upvoted 3 times
...
TNT87
1 year, 2 months ago
Selected Answer: B
B. Write the code as a TensorFlow Extended (TFX) pipeline orchestrated with Vertex AI Pipelines. Use standard TFX components for data validation and model analysis, and use Vertex AI Pipelines for model retraining. The reason for this choice is that TFX and Vertex AI Pipelines provide a scalable and cost-effective solution for productionizing machine learning models. TFX is an end-to-end ML platform for building scalable and repeatable ML workflows, while Vertex AI Pipelines provides a fully managed service for orchestrating ML workflows at scale. By using TFX and Vertex AI Pipelines, you can automate the execution of the steps contained in the Jupyter notebook, and schedule the pipeline for weekly retraining. This approach also takes advantage of managed services, which helps to minimize cost.
upvoted 3 times
...
ares81
1 year, 4 months ago
Selected Answer: B
All the others look really wrong, so B.
upvoted 2 times
...
hiromi
1 year, 4 months ago
Selected Answer: B
B (not sure)
upvoted 2 times
...

Topic 1 Question 130

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 130 discussion

You are working on a system log anomaly detection model for a cybersecurity organization. You have developed the model using TensorFlow, and you plan to use it for real-time prediction. You need to create a Dataflow pipeline to ingest data via Pub/Sub and write the results to BigQuery. You want to minimize the serving latency as much as possible. What should you do?

  • A. Containerize the model prediction logic in Cloud Run, which is invoked by Dataflow.
  • B. Load the model directly into the Dataflow job as a dependency, and use it for prediction.
  • C. Deploy the model to a Vertex AI endpoint, and invoke this endpoint in the Dataflow job.
  • D. Deploy the model in a TFServing container on Google Kubernetes Engine, and invoke it in the Dataflow job.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
guilhermebutzke
Highly Voted 1 year, 9 months ago
Selected Answer: C
C. According Google: "Instead of deploying the model to an endpoint, you can use the RunInference API to serve machine learning models in your Apache Beam pipeline. This approach has several advantages, including flexibility and portability. However, deploying the model in Vertex AI offers many additional benefits, such as the platform's built-in tools for model monitoring, TensorBoard, and model registry governance. Vertex AI also provides the ability to use Optimized TensorFlow runtime in your endpoints. To do this, simply specify the TensorFlow runtime container when you deploy your model." https://cloud.google.com/blog/products/ai-machine-learning/streaming-prediction-with-dataflow-and-vertex
upvoted 9 times
f084277
12 months ago
The quote you cite describes B as the right answer, not C... the question asks *only* about minimizing latency.
upvoted 4 times
...
...
f084277
Highly Voted 12 months ago
Selected Answer: B
the question asks *only* about minimizing latency. Doing everything in Dataflow minimizes latency over all the other options.
upvoted 8 times
...
OpenKnowledge
Most Recent 3 weeks, 6 days ago
Selected Answer: B
RunInference API can be used to invoke the prediction model from within the Dataflow pipeline
upvoted 1 times
...
dija123
1 month, 1 week ago
Selected Answer: B
Options A, C, and D all introduce a significant source of latency: a network hop. A (Cloud Run), C (Vertex AI Endpoint), and D (GKE).
upvoted 1 times
...
desertlotus1211
8 months, 1 week ago
Selected Answer: B
By loading the TensorFlow model directly into the Dataflow job, you ensure that inference happens inline within the pipeline on the worker nodes. Using external endpoints (Options A, C, and D) introduces extra latency due to network round trips, which is not ideal for real-time prediction in a cybersecurity context
upvoted 3 times
...
SausageMuffins
1 year, 6 months ago
Selected Answer: B
It's a toss up between B and C. I chose B because using vertex AI as an endpoint introduces network latency which naturally does not meet the criteria of "minimizing latency". However, choosing option B also implies that I have more overhead by directly running the model in the dataflow pipeline. Since the question didn't mention any limitations on resources, I assumed that the resources can be scaled accordingly to minimize latency. I might be overthinking on this option though seeing how most of Google questions have a strong preference on their "recommended platforms" like vertex AI. Most of the questions and the community answers seem to tend towards anything that mentions "vertex ai".
upvoted 6 times
...
guilhermebutzke
1 year, 9 months ago
According Google: "Instead of deploying the model to an endpoint, you can use the RunInference API to serve machine learning models in your Apache Beam pipeline. This approach has several advantages, including flexibility and portability. However, deploying the model in Vertex AI offers many additional benefits, such as the platform's built-in tools for model monitoring, TensorBoard, and model registry governance. Vertex AI also provides the ability to use Optimized TensorFlow runtime in your endpoints. To do this, simply specify the TensorFlow runtime container when you deploy your model." https://cloud.google.com/blog/products/ai-machine-learning/streaming-prediction-with-dataflow-and-vertex
upvoted 2 times
...
tavva_prudhvi
2 years, 4 months ago
Selected Answer: C
In this case, the best way to minimize the serving latency of the system log anomaly detection model is to deploy it to a Vertex AI endpoint. This will allow Dataflow to invoke the model directly, without having to load it into the job as a dependency. This will significantly reduce the serving latency, as Dataflow will not have to wait for the model to load before it can make a prediction. Option B would involve loading the model directly into the Dataflow job as a dependency. This would also add an additional layer of latency, as Dataflow would have to load the model into memory before it could make a prediction.
upvoted 3 times
...
Voyager2
2 years, 5 months ago
C. Deploy the model to a Vertex AI endpoint, and invoke this endpoint in the Dataflow job https://cloud.google.com/architecture/minimizing-predictive-serving-latency-in-machine-learning
upvoted 1 times
...
julliet
2 years, 5 months ago
Selected Answer: C
C I eliminate B because Dataflow is a batch-prediction solution, not real-time
upvoted 2 times
7cb0ab3
1 year, 7 months ago
Dataflow has a streaming pipeline solution as well.
upvoted 3 times
...
Fer660
2 months, 2 weeks ago
dataflow = apache beam = batch & stream
upvoted 1 times
...
...
M25
2 years, 6 months ago
Selected Answer: C
Went with C
upvoted 1 times
...
Antmal
2 years, 6 months ago
Selected Answer: C
I believe it is C when deploying the model to a Vertex AI endpoint it provides a dedicated prediction service optimised for real-time inference. Vertex AI endpoints are designed for high performance and low latency, making them ideal for real-time prediction use cases. Dataflow can easily invoke the Vertex AI endpoint to perform predictions, minimising serving latency.
upvoted 1 times
...
hghdh5454
2 years, 7 months ago
Selected Answer: B
B. Load the model directly into the Dataflow job as a dependency, and use it for prediction. By loading the model directly into the Dataflow job as a dependency, you minimize the serving latency since the model is available within the pipeline itself. This way, you avoid additional network latency that would be introduced by invoking external services, such as Cloud Run, Vertex AI endpoints, or TFServing containers.
upvoted 6 times
Antmal
2 years, 6 months ago
Actually in retrospect C is the correct answer, not B because loading the model directly into the Dataflow job as a dependency may cause unnecessary overhead, as Dataflow jobs are primarily designed for batch processing and may not be optimized for real-time prediction. Additionally, loading the model as a dependency may increase the size of the Dataflow job and introduce complexity in managing dependencies.
upvoted 1 times
desertlotus1211
8 months, 1 week ago
Overhead and Latency are not the same thing. the question ask to minimize latency not cost.
upvoted 2 times
...
...
...
wlts
2 years, 7 months ago
Selected Answer: B
By loading the model directly into the Dataflow job as a dependency, you can perform predictions within the same job. This approach helps minimize serving latency since there is no need to make external calls to another service or endpoint. Instead, the model is directly available within the Dataflow pipeline, allowing for efficient and fast processing of the streaming data.
upvoted 1 times
...
TNT87
2 years, 8 months ago
Selected Answer: C
C. Deploy the model to a Vertex AI endpoint, and invoke this endpoint in the Dataflow job. The reason for this choice is that deploying the model to a Vertex AI endpoint and invoking it in the Dataflow job is the most efficient and scalable option for real-time prediction. Vertex AI provides a fully managed, serverless platform for deploying and serving machine learning models. It allows for high availability and low-latency serving of models, and can handle a large volume of requests in parallel. Invoking the model via an endpoint in the Dataflow job minimizes the latency for model prediction, as it avoids any unnecessary data transfers or containerization
upvoted 2 times
TNT87
2 years, 6 months ago
Using private endpoints to serve online predictions with Vertex AI provides a low-latency, secure connection to the Vertex AI online prediction service. This guide shows how to configure private endpoints on Vertex AI by using VPC Network Peering to peer your network with the Vertex AI online prediction service https://cloud.google.com/vertex-ai/docs/predictions/using-private-endpoints Answer C
upvoted 1 times
...
...
shankalman717
2 years, 8 months ago
Selected Answer: C
Option B, loading the model directly into the Dataflow job as a dependency and using it for prediction, may not provide the optimal performance because Dataflow may not be optimized for low-latency predictions.
upvoted 1 times
...
John_Pongthorn
2 years, 9 months ago
These are anwser https://cloud.google.com/dataflow/docs/notebooks/run_inference_tensorflow https://beam.apache.org/documentation/sdks/python-machine-learning/ https://beam.apache.org/documentation/transforms/python/elementwise/runinference/
upvoted 2 times
...

Topic 1 Question 131

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 131 discussion

You are an ML engineer at a mobile gaming company. A data scientist on your team recently trained a TensorFlow model, and you are responsible for deploying this model into a mobile application. You discover that the inference latency of the current model doesn’t meet production requirements. You need to reduce the inference time by 50%, and you are willing to accept a small decrease in model accuracy in order to reach the latency requirement. Without training a new model, which model optimization technique for reducing latency should you try first?

  • A. Weight pruning
  • B. Dynamic range quantization
  • C. Model distillation
  • D. Dimensionality reduction
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
TNT87
Highly Voted 1 year, 2 months ago
B. Dynamic range quantization The reason for this choice is that dynamic range quantization is a model optimization technique that can significantly reduce model size and inference time while maintaining reasonable model accuracy. Dynamic range quantization uses fewer bits to represent the weights of the model, reducing the memory required to store the model and the time required for inference.
upvoted 7 times
...
julliet
Most Recent 11 months, 3 weeks ago
Selected Answer: B
B. A, C, D --> have to retrain
upvoted 4 times
...
M25
1 year ago
Selected Answer: B
Plus: “Magnitude-based weight pruning gradually zeroes out model weights during the training process to achieve model sparsity. Sparse models are easier to compress, and we can skip the zeroes during inference for latency improvements.” https://www.tensorflow.org/model_optimization/guide/pruning, where “during the training process” disqualifies Option A.
upvoted 1 times
M25
1 year ago
https://en.wikipedia.org/wiki/Knowledge_distillation is the process of transferring knowledge from a large model to a smaller one. As smaller models are less expensive to evaluate, they can be deployed on less powerful hardware (such as a mobile device). https://en.wikipedia.org/wiki/Dimensionality_reduction is the transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data. “Without training a new model” disqualifies both Option C and D.
upvoted 1 times
...
...
ares81
1 year, 4 months ago
Selected Answer: B
'Without training a new model' --> B
upvoted 4 times
...
hiromi
1 year, 4 months ago
Selected Answer: B
B - https://www.tensorflow.org/lite/performance/post_training_quantization#dynamic_range_quantization
upvoted 4 times
...
hiromi
1 year, 4 months ago
B -https://www.tensorflow.org/lite/performance/post_training_quantization#dynamic_range_quantization
upvoted 1 times
...
mil_spyro
1 year, 5 months ago
Selected Answer: B
The requirement is "Without training a new model" hence dynamic range quantization. https://www.tensorflow.org/lite/performance/post_training_quant
upvoted 4 times
...

Topic 1 Question 132

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 132 discussion

You work on a data science team at a bank and are creating an ML model to predict loan default risk. You have collected and cleaned hundreds of millions of records worth of training data in a BigQuery table, and you now want to develop and compare multiple models on this data using TensorFlow and Vertex AI. You want to minimize any bottlenecks during the data ingestion state while considering scalability. What should you do?

  • A. Use the BigQuery client library to load data into a dataframe, and use tf.data.Dataset.from_tensor_slices() to read it.
  • B. Export data to CSV files in Cloud Storage, and use tf.data.TextLineDataset() to read them.
  • C. Convert the data into TFRecords, and use tf.data.TFRecordDataset() to read them.
  • D. Use TensorFlow I/O’s BigQuery Reader to directly read the data.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
hiromi
Highly Voted 2 years, 4 months ago
Selected Answer: D
D - https://www.tensorflow.org/io/api_docs/python/tfio/bigquery
upvoted 8 times
...
mil_spyro
Highly Voted 2 years, 5 months ago
Selected Answer: D
Vote on D. This will allow to directly access the data from BigQuery without having to first load it into a dataframe or export it to files in Cloud Storage.
upvoted 7 times
...
OpenKnowledge
Most Recent 1 month ago
Selected Answer: D
The BigQuery reader in TensorFlow, specifically part of the tensorflow-io library, allows users to directly read data from BigQuery tables into a TensorFlow model training and evaluation. The BigQuery reader in TensorFlow is designed for parallel processing, using the BigQuery Storage API to retrieve data in high-throughput parallel streams. The BigQueryReadSession, the key component in TensorFlow I/O, is configured to create a set of parallel data streams and handles parallel data access. The BigQuery reader is Designed to handle large-scale datasets stored in BigQuery.
upvoted 1 times
...
Fer660
2 months, 2 weeks ago
Selected Answer: C
We have two goals: avoid ingestion bottlenecks, and keep an eye on scalability. TFRecords are the gold standard for ingestion speed. Yes, we will have to spend some initial efforts to export all this to a storage bucket and convert to TFRecord, but the problem statement does not preclude this. The fact that we will be training many models on this data also indicates that it might be worthwhile to spend some time for the conversion upfront, in order to get repeated benefits down the line.
upvoted 2 times
...
spradhan
3 months, 3 weeks ago
Selected Answer: C
I think you are training multiple models. Does it not make more sense that you do TFrecord conversion once instead of reading millions of records for each model
upvoted 1 times
...
desertlotus1211
8 months, 1 week ago
Selected Answer: C
Why not C? Answer D may introduce latency or bottlenecks due to network constraints and is not as optimized for large-scale training as the TFRecord approach. Thoughts?
upvoted 1 times
...
fitri001
1 year ago
Selected Answer: D
Direct Data Access: TensorFlow I/O's BigQuery Reader allows you to directly access data from BigQuery tables within your TensorFlow script.expand_more This eliminates the need for intermediate data movement (e.g., to CSV files) and data manipulation steps (e.g., loading into DataFrames).exclamation Scalability: BigQuery Reader is designed to handle large datasets efficiently. It leverages BigQuery's parallel processing capabilities to stream data into your TensorFlow training pipeline, minimizing processing bottlenecks and enabling scalability as your data volume grows.
upvoted 3 times
fitri001
1 year ago
. BigQuery Client Library and Dataframe: While the BigQuery client library can access BigQuery data, loading it into a DataFrame and using tf.data.Dataset.from_tensor_slices() is inefficient for massive datasets due to memory limitations and potential processing bottlenecks. B. CSV Files and TextLineDataset: Exporting data to CSV and using tf.data.TextLineDataset() introduces unnecessary data movement and processing overhead, hindering both efficiency and scalability. C. TFRecords: TFRecords can be efficient for certain use cases, but converting hundreds of millions of records into TFRecords can be time-consuming and resource-intensive. pen_spark exclamation Additionally, reading them might require parsing logic within your TensorFlow script.
upvoted 2 times
...
...
guilhermebutzke
1 year, 3 months ago
Selected Answer: D
D https://cloud.google.com/blog/products/ai-machine-learning/tensorflow-enterprise-makes-accessing-data-on-google-cloud-faster-and-easier
upvoted 1 times
...
julliet
1 year, 11 months ago
Selected Answer: D
D BigQuery is more compact way to store the data than TFRecords
upvoted 2 times
...
M25
2 years ago
Selected Answer: D
Went with D
upvoted 1 times
...
TNT87
2 years, 2 months ago
Selected Answer: D
D. Use TensorFlow I/O’s BigQuery Reader to directly read the data. The reason for this choice is that using TensorFlow I/O’s BigQuery Reader is the most efficient and scalable option for reading data directly from BigQuery into TensorFlow models. It allows for distributed processing and avoids unnecessary data duplication, which can cause bottlenecks and consume large amounts of storage. Additionally, the BigQuery Reader is optimized for reading data in parallel from BigQuery tables and streaming them directly into TensorFlow. This eliminates the need for any intermediate file formats or data copies, reducing latency and increasing performance.
upvoted 3 times
...

Topic 1 Question 133

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 133 discussion

You have recently created a proof-of-concept (POC) deep learning model. You are satisfied with the overall architecture, but you need to determine the value for a couple of hyperparameters. You want to perform hyperparameter tuning on Vertex AI to determine both the appropriate embedding dimension for a categorical feature used by your model and the optimal learning rate. You configure the following settings:
• For the embedding dimension, you set the type to INTEGER with a minValue of 16 and maxValue of 64.
• For the learning rate, you set the type to DOUBLE with a minValue of 10e-05 and maxValue of 10e-02.

You are using the default Bayesian optimization tuning algorithm, and you want to maximize model accuracy. Training time is not a concern. How should you set the hyperparameter scaling for each hyperparameter and the maxParallelTrials?

  • A. Use UNIT_LINEAR_SCALE for the embedding dimension, UNIT_LOG_SCALE for the learning rate, and a large number of parallel trials.
  • B. Use UNIT_LINEAR_SCALE for the embedding dimension, UNIT_LOG_SCALE for the learning rate, and a small number of parallel trials.
  • C. Use UNIT_LOG_SCALE for the embedding dimension, UNIT_LINEAR_SCALE for the learning rate, and a large number of parallel trials.
  • D. Use UNIT_LOG_SCALE for the embedding dimension, UNIT_LINEAR_SCALE for the learning rate, and a small number of parallel trials.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
guilhermebutzke
Highly Voted 1 year, 9 months ago
Selected Answer: B
B: Here's why: **Embedding Dimension:** UNIT_LINEAR_SCALE is appropriate for integer hyperparameters with a continuous range like the embedding dimension. It linearly scales the search space from minValue to maxValue. **Learning Rate:** UNIT_LOG_SCALE is generally recommended for hyperparameters with values spanning multiple orders of magnitude like the learning rate (10e-05 - 10e-02). This ensures equal sampling probability across different log-scaled ranges. **Parallel Trials:** as the documentation specifies, parallelization speeds up. However, this speedup comes at the cost of potentially sacrificing the quality of the results. Since training time is not a factor in this case, the benefit of speeding things up with many parallel trials is less valuable. https://cloud.google.com/vertex-ai/docs/training/using-hyperparameter-tuning#parallel-trials
upvoted 17 times
LFavero
1 year, 8 months ago
this is the perfect answer and explanation.
upvoted 3 times
...
...
YangG
Highly Voted 2 years, 11 months ago
Selected Answer: B
Vote B
upvoted 11 times
...
OpenKnowledge
Most Recent 1 month ago
Selected Answer: B
Hyperparameter tuning is the process of running multiple trials to find the set of hyperparameters that yields the best model performance. Running parallel trials has the benefit of reducing the time the training job takes. However, running in parallel can reduce the effectiveness of the tuning job overall. That is because hyperparameter tuning uses the results of previous trials to inform the values to assign to the hyperparameters of subsequent trials. When running in parallel, some trials start without having the benefit of the results of any trials still running. The choice for maxParallelTrials involves a trade-off between the speed of the tuning process and the quality of the final result, especially when using Bayesian optimization. With higher number in maxParallelTrials, the total time for tuning is greatly reduced because many experiments are run at the same time. But the quality of the tuning is reduced as well.
upvoted 1 times
...
NamitSehgal
8 months, 3 weeks ago
Selected Answer: A
maxParallelTrials: Small number of parallel trials: A small number of trials would limit the exploration of the hyperparameter space and might prevent you from finding the best possible model. Since training time is not a concern for you, and you want to maximize model accuracy, using a large number of parallel trials is beneficial.
upvoted 2 times
...
AB_C
11 months, 2 weeks ago
Selected Answer: A
as training time is not an issue hence the answer should be A
upvoted 3 times
...
pinimichele01
1 year, 6 months ago
Training time is not a concern -> B (the benefit of speeding things up with many parallel trials is less valuable)
upvoted 1 times
...
gscharly
1 year, 7 months ago
Selected Answer: B
Vote B
upvoted 1 times
...
Mickey321
1 year, 12 months ago
Selected Answer: A
because training time is not a concern and you want to maximize accuracy, using a large number of maxParallelTrials (option A) allows thoroughly searching the hyperparameter space.
upvoted 3 times
...
Voyager2
2 years, 5 months ago
Selected Answer: B
B. Use UNIT_LINEAR_SCALE for the embedding dimension, UNIT_LOG_SCALE for the learning rate, and a small number of parallel trials. https://cloud.google.com/vertex-ai/docs/training/using-hyperparameter-tuning First we should choos an option with small trials: "Before starting a job with a large number of trials, you may want to start with a small number of trials to gauge the effect your chosen hyperparameters have on your model's accuracy." Now, the embeddings should be linear https://cloud.google.com/blog/products/gcp/hyperparameter-tuning-on-google-cloud-platform-is-now-faster-and-smarter
upvoted 1 times
...
M25
2 years, 6 months ago
Selected Answer: B
Went with B
upvoted 1 times
...
JamesDoe
2 years, 7 months ago
Selected Answer: B
https://cloud.google.com/vertex-ai/docs/training/using-hyperparameter-tuning#parallel-trials "Running parallel trials has the benefit of reducing the time the training job takes (real time—the total processing time required is not typically changed). However, running in parallel can reduce the effectiveness of the tuning job overall." Since opt. for accuracy and ignore training time, use above. Linear for learning rate doesn't really make sense, think that one is obvious imo.
upvoted 2 times
...
TNT87
2 years, 8 months ago
Selected Answer: B
Answer is B , even my explanation is on B not C Option B is the best choice: Use UNIT_LOG_SCALE for the embedding dimension, UNIT_LINEAR_SCALE for the learning rate, and a large number of parallel trials. The reason for this choice is as follows: For the embedding dimension, it is better to use a logarithmic scale because the effect of increasing the dimensionality is likely to diminish as the dimension grows larger. Therefore, the logarithmic scale will allow the tuning algorithm to explore a wider range of values with less bias towards higher values
upvoted 1 times
...
TNT87
2 years, 8 months ago
Selected Answer: C
Option C is the best choice: Use UNIT_LOG_SCALE for the embedding dimension, UNIT_LINEAR_SCALE for the learning rate, and a large number of parallel trials. The reason for this choice is as follows: For the embedding dimension, it is better to use a logarithmic scale because the effect of increasing the dimensionality is likely to diminish as the dimension grows larger. Therefore, the logarithmic scale will allow the tuning algorithm to explore a wider range of values with less bias towards higher values
upvoted 1 times
TNT87
2 years, 8 months ago
Meant to choose B ahhhh
upvoted 1 times
...
...
John_Pongthorn
2 years, 9 months ago
Selected Answer: B
Learning Rage is subtle and take time so, it use Log Scale
upvoted 2 times
...
ares81
2 years, 10 months ago
Selected Answer: B
It's B!
upvoted 1 times
...
hiromi
2 years, 10 months ago
Selected Answer: A
A - https://cloud.google.com/ai-platform/training/docs/reference/rest/v1/projects.jobs#HyperparameterSpec - https://cloud.google.com/vertex-ai/docs/reference/rest/v1beta1/StudySpec
upvoted 1 times
hiromi
2 years, 10 months ago
Sorry, B is the answer
upvoted 1 times
...
...
mil_spyro
2 years, 11 months ago
Selected Answer: D
Vote D, this can help the tuning algorithm explore a wider range of values for the learning rate, while also focusing on a smaller range of values for the embedding dimension.
upvoted 2 times
...

Topic 1 Question 134

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 134 discussion

You are the Director of Data Science at a large company, and your Data Science team has recently begun using the Kubeflow Pipelines SDK to orchestrate their training pipelines. Your team is struggling to integrate their custom Python code into the Kubeflow Pipelines SDK. How should you instruct them to proceed in order to quickly integrate their code with the Kubeflow Pipelines SDK?

  • A. Use the func_to_container_op function to create custom components from the Python code.
  • B. Use the predefined components available in the Kubeflow Pipelines SDK to access Dataproc, and run the custom code there.
  • C. Package the custom Python code into Docker containers, and use the load_component_from_file function to import the containers into the pipeline.
  • D. Deploy the custom Python code to Cloud Functions, and use Kubeflow Pipelines to trigger the Cloud Function.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
TNT87
Highly Voted 1 year, 2 months ago
Selected Answer: A
A. Use the func_to_container_op function to create custom components from the Python code. The func_to_container_op function in the Kubeflow Pipelines SDK is specifically designed to convert Python functions into containerized components that can be executed in a Kubernetes cluster. By using this function, the Data Science team can easily integrate their custom Python code into the Kubeflow Pipelines SDK without having to learn the details of containerization or Kubernetes.
upvoted 5 times
...
hiromi
Highly Voted 1 year, 4 months ago
Selected Answer: A
A -https://kubeflow-pipelines.readthedocs.io/en/stable/source/kfp.components.html?highlight=func_to_container_op%20#kfp.components.func_to_container_op
upvoted 5 times
...
M25
Most Recent 1 year ago
Selected Answer: A
Went with A
upvoted 2 times
...
Antmal
1 year ago
Selected Answer: A
The answer is A. because the Kubeflow Pipelines SDK provides a convenient way to create custom components from existing Python code using the func_to_container_op function. This allows data science team to encapsulate the custom code as containerised components that can be easily integrated into the kubeflow pipeline. This approach allows for seamless integration of custom Python code into the Kubeflow Pipelines SDK without requiring additional dependencies or infrastructure setup.
upvoted 4 times
...
mil_spyro
1 year, 5 months ago
Selected Answer: A
Use the func_to_container_op function to create custom components from their code. This function allows you to define a Python function that can be used as a pipeline component, and it automatically creates a Docker container with the necessary dependencies
upvoted 3 times
...

Topic 1 Question 135

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 135 discussion

You work for the AI team of an automobile company, and you are developing a visual defect detection model using TensorFlow and Keras. To improve your model performance, you want to incorporate some image augmentation functions such as translation, cropping, and contrast tweaking. You randomly apply these functions to each training batch. You want to optimize your data processing pipeline for run time and compute resources utilization. What should you do?

  • A. Embed the augmentation functions dynamically in the tf.Data pipeline.
  • B. Embed the augmentation functions dynamically as part of Keras generators.
  • C. Use Dataflow to create all possible augmentations, and store them as TFRecords.
  • D. Use Dataflow to create the augmentations dynamically per training run, and stage them as TFRecords.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
guilhermebutzke
Highly Voted 1 year, 3 months ago
Selected Answer: A
Option A: By embedding augmentation in the tf.data pipeline, data augmentation is applied on-the-fly during training, reducing the need to store pre-augmented data. Option B could be a choice, but since Keras generators are built on top of tf.data, they are less flexible and have a lower level of optimization compared to tf.data.
upvoted 6 times
...
tavva_prudhvi
Most Recent 1 year, 6 months ago
Selected Answer: A
A is best because, 1. It allows you to apply the augmentations on-the-fly during training, which eliminates the need for pre-processing and storing a large number of augmented images. This saves both storage space and compute resources. 2. The tf.Data pipeline is highly optimized for efficient data loading and processing, ensuring that your model training process is not bottlenecked by data preprocessing. 3. By applying augmentations randomly to each training batch, you increase the diversity of your training data, which can help your model generalize better to unseen data. Keras generators can be used for data augmentation, but tf.Data pipelines are generally more efficient and flexible for creating complex data processing pipelines.
upvoted 2 times
...
pico
1 year, 6 months ago
Selected Answer: B
B (but also A) B is a common and efficient approach for applying data augmentation during training. This allows you to apply data augmentation on-the-fly without the need to pre-generate or store augmented images separately, which saves storage space and reduces the preprocessing time. Keras provides various tools and functions for data augmentation, and you can easily incorporate them into your training data pipeline. A can also be a good choice, especially if you are using TensorFlow's tf.data API for data loading and preprocessing. It can provide similar benefits by applying augmentations on-the-fly, but it may require more custom code to implement compared to Keras data generators.
upvoted 1 times
guilhermebutzke
1 year, 3 months ago
Yes, but I think the questions says: "You want to optimize your data processing pipeline for run time and compute resources utilization". keras is not optmizated as tf.Data
upvoted 1 times
...
...
envest
1 year, 9 months ago
by abylead: B) Keras generators embedded augmentation functions offers at least Translation, Crop, and Contrast preprocessing. You can either permanently integrate the functions or you randomly use a dataset with non CPU blocking async training batches & optimized GPU processing overlapping. By applying Keras embedded augmentation functions, the tf.data pipeline can still be performance optimized. With tf.image pipelines you lack pipeline performance optimization & the deprecated translation function. In addition, the complex application hinders random operation flexibilty.
upvoted 1 times
...
PST21
1 year, 9 months ago
B - TensorFlow's Keras API provides built-in support for data augmentation using various image preprocessing layers, such as RandomTranslation, RandomCrop, and RandomContrast, among others. You can create custom image augmentation functions and include them as part of your Keras generators, tailoring them to your specific use case and needs. In summary, Option B, embedding the augmentation functions dynamically as part of Keras generators, offers efficient on-the-fly data augmentation, reduced storage overhead, optimized resource utilization, and greater flexibility, making it the best choice for thisscenario.
upvoted 1 times
tavva_prudhvi
1 year, 9 months ago
lthough Keras generators can be used for data augmentation, using the tf.data pipeline provides better performance and efficiency. The tf.data API is more flexible and better integrated with TensorFlow, allowing for more optimizations.especially if you have a large number of images to process.
upvoted 2 times
...
...
M25
2 years ago
Selected Answer: A
Went with A
upvoted 1 times
...
matamata415
2 years, 1 month ago
Selected Answer: A
https://www.tensorflow.org/tutorials/load_data/images?hl=ja#tfdata_%E3%82%92%E4%BD%BF%E7%94%A8%E3%81%97%E3%81%A6%E3%82%88%E3%82%8A%E7%B2%BE%E5%AF%86%E3%81%AB%E5%88%B6%E5%BE%A1%E3%81%99%E3%82%8B
upvoted 2 times
matamata415
2 years, 1 month ago
https://www.tensorflow.org/tutorials/load_data/images#using_tfdata_for_finer_control
upvoted 2 times
...
...
Yajnas_arpohc
2 years, 1 month ago
Selected Answer: A
https://towardsdatascience.com/time-to-choose-tensorflow-data-over-imagedatagenerator-215e594f2435
upvoted 1 times
...
TNT87
2 years, 2 months ago
Selected Answer: A
A. Embed the augmentation functions dynamically in the tf.Data pipeline is the best approach to optimize the data processing pipeline for runtime and compute resource utilization. Using the tf.data pipeline, you can apply data augmentation functions dynamically to each batch during training. This approach avoids the overhead of creating preprocessed TFRecords or Keras generators, which can consume additional disk space, memory, and CPU. Additionally, using the tf.data pipeline, you can parallelize data preprocessing, input pipeline operations, and model training
upvoted 3 times
...
shankalman717
2 years, 2 months ago
Selected Answer: A
Embedding the augmentation functions dynamically in the tf.Data pipeline allows the data pipeline to apply the augmentations on the fly as the data is being loaded into the model during training. This means that the model can utilize the compute resources effectively by loading and processing the data as needed, rather than pre-generating all possible augmentations ahead of time (as in options C and D), which could be computationally expensive and time-consuming. Option B is also a viable choice, but it may not be as efficient as option A since the data augmentation functions would be applied during training using Keras generators, which could cause some overhead.
upvoted 3 times
...
pshemol
2 years, 3 months ago
Selected Answer: B
will go for B too https://www.analyticsvidhya.com/blog/2020/08/image-augmentation-on-the-fly-using-keras-imagedatagenerator/
upvoted 1 times
...
John_Pongthorn
2 years, 3 months ago
Either of A or B : I am not convinced of what the right answer is. but it is on https://www.tensorflow.org/tutorials/images/data_augmentation#apply_augmentation_to_a_dataset certainly
upvoted 1 times
...
hiromi
2 years, 4 months ago
Selected Answer: A
A (not sure)
upvoted 1 times
...
YangG
2 years, 5 months ago
Selected Answer: B
will go for B https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-fly
upvoted 2 times
...
mil_spyro
2 years, 5 months ago
Selected Answer: A
incorporating the augmentation functions into the pipeline, you can apply them dynamically to each training batch, without the need to generate all possible augmentations in advance or stage them as TFRecords.
upvoted 4 times
...

Topic 1 Question 136

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 136 discussion

You work for an online publisher that delivers news articles to over 50 million readers. You have built an AI model that recommends content for the company’s weekly newsletter. A recommendation is considered successful if the article is opened within two days of the newsletter’s published date and the user remains on the page for at least one minute.

All the information needed to compute the success metric is available in BigQuery and is updated hourly. The model is trained on eight weeks of data, on average its performance degrades below the acceptable baseline after five weeks, and training time is 12 hours. You want to ensure that the model’s performance is above the acceptable baseline while minimizing cost. How should you monitor the model to determine when retraining is necessary?

  • A. Use Vertex AI Model Monitoring to detect skew of the input features with a sample rate of 100% and a monitoring frequency of two days.
  • B. Schedule a cron job in Cloud Tasks to retrain the model every week before the newsletter is created.
  • C. Schedule a weekly query in BigQuery to compute the success metric.
  • D. Schedule a daily Dataflow job in Cloud Composer to compute the success metric.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
TNT87
Highly Voted 2 years, 2 months ago
Selected Answer: C
Option C is the best answer. Since all the information needed to compute the success metric is available in BigQuery and is updated hourly, scheduling a weekly query in BigQuery to compute the success metric is the simplest and most cost-effective way to monitor the model's performance. By comparing the computed success metric against the acceptable baseline, you can determine when the model's performance has degraded below the threshold, and retrain the model accordingly. This approach avoids the cost of additional monitoring infrastructure and leverages existing data processing capabilities.
upvoted 9 times
...
fitri001
Most Recent 1 year ago
Selected Answer: C
Weekly checks are frequent enough to catch performance degradation before the next newsletter (5-week threshold). The success metric can be directly calculated within the query, providing a clear indication for retraining.
upvoted 3 times
fitri001
1 year ago
A. Vertex AI Model Monitoring for feature skew: This monitors data drift, which can be helpful, but it doesn't directly address the success metric of article opens and dwell time. B. Cron job for weekly retraining: Retraining every week, regardless of performance, is excessive and costly, considering the 12-hour training time. D. Daily Dataflow job: While daily computation provides more data points, it might be overkill compared to a weekly check. Additionally, Cloud Composer adds complexity for a simple task.
upvoted 2 times
...
...
julliet
1 year, 11 months ago
Selected Answer: C
As we have all the data in BigQuery
upvoted 2 times
...
M25
2 years ago
Selected Answer: C
Went with C
upvoted 3 times
...
Antmal
2 years ago
Selected Answer: A
Option A because when using Vertex AI Model Monitoring, you can set up automated monitoring of the model's performance by detecting skew of the input features, which can help you identify any changes in the data distribution that may impact the model's performance. Setting the sample rate to 100% ensures that all incoming data is monitored, and a monitoring frequency of two days allows for timely detection of any deviations from the expected data distribution
upvoted 1 times
Antmal
1 year, 12 months ago
I have changed my mind. I will choose C
upvoted 1 times
...
...
John_Pongthorn
2 years, 2 months ago
Selected Answer: C
This question tweak from this article surely. https://cloud.google.com/blog/topics/developers-practitioners/continuous-model-evaluation-bigquery-ml-stored-procedures-and-cloud-scheduler
upvoted 2 times
...
John_Pongthorn
2 years, 3 months ago
The anwner is on here https://cloud.google.com/blog/topics/developers-practitioners/continuous-model-evaluation-bigquery-ml-stored-procedures-and-cloud-scheduler
upvoted 2 times
...
hiromi
2 years, 4 months ago
Selected Answer: C
C (not sure)
upvoted 1 times
...
pshemol
2 years, 4 months ago
Selected Answer: C
"All the information needed to compute the success metric is available in BigQuery" and "on average its performance degrades below the acceptable baseline after five weeks" so once per week is enough to check models performance. And it's the cheapest solution too.
upvoted 3 times
...
mil_spyro
2 years, 5 months ago
Selected Answer: D
This can help to ensure that the model’s performance is above the baseline, while minimizing cost by avoiding unnecessary retraining.
upvoted 1 times
...

Topic 1 Question 137

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 137 discussion

You deployed an ML model into production a year ago. Every month, you collect all raw requests that were sent to your model prediction service during the previous month. You send a subset of these requests to a human labeling service to evaluate your model’s performance. After a year, you notice that your model's performance sometimes degrades significantly after a month, while other times it takes several months to notice any decrease in performance. The labeling service is costly, but you also need to avoid large performance degradations. You want to determine how often you should retrain your model to maintain a high level of performance while minimizing cost. What should you do?

  • A. Train an anomaly detection model on the training dataset, and run all incoming requests through this model. If an anomaly is detected, send the most recent serving data to the labeling service.
  • B. Identify temporal patterns in your model’s performance over the previous year. Based on these patterns, create a schedule for sending serving data to the labeling service for the next year.
  • C. Compare the cost of the labeling service with the lost revenue due to model performance degradation over the past year. If the lost revenue is greater than the cost of the labeling service, increase the frequency of model retraining; otherwise, decrease the model retraining frequency.
  • D. Run training-serving skew detection batch jobs every few days to compare the aggregate statistics of the features in the training dataset with recent serving data. If skew is detected, send the most recent serving data to the labeling service.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
TNT87
Highly Voted 2 years, 8 months ago
Selected Answer: D
Option D is the best approach to determine how often to retrain the model while minimizing cost. Running training-serving skew detection batch jobs every few days to compare the aggregate statistics of the features in the training dataset with recent serving data is an effective way to detect when the model's performance has degraded. If skew is detected, the most recent serving data should be sent to the labeling service to evaluate the model's performance. This approach is more cost-effective than sending a subset of requests to the labeling service every month because it only sends data when there is a high probability that the model's performance has degraded. By doing this, the model can be retrained at the right time, and the cost of the labeling service can be minimized.
upvoted 7 times
...
NamitSehgal
Most Recent 8 months, 3 weeks ago
Selected Answer: B
skew detection does not tell overall model performance and running too frequent is not good.
upvoted 1 times
...
f084277
12 months ago
Selected Answer: D
Clearly D. B is just guesswork.
upvoted 1 times
...
M25
2 years, 6 months ago
Selected Answer: D
Went with D
upvoted 2 times
...
John_Pongthorn
2 years, 9 months ago
Selected Answer: D
D https://cloud.google.com/blog/topics/developers-practitioners/monitor-models-training-serving-skew-vertex-aiew-vertex-ai&ved=2ahUKEwiRg_aoj9n8AhWb7TgGHcGCDREQFnoECAwQAQ&usg=AOvVaw197NneIJM0ra7fLq2zsOin
upvoted 2 times
...
ares81
2 years, 10 months ago
Selected Answer: B
B looks the only option, to me.
upvoted 2 times
...
hiromi
2 years, 10 months ago
Selected Answer: D
D - https://cloud.google.com/blog/topics/developers-practitioners/monitor-models-training-serving-skew-vertex-ai - https://developers.google.com/machine-learning/guides/rules-of-ml
upvoted 4 times
...
mymy9418
2 years, 10 months ago
Selected Answer: D
I think D
upvoted 1 times
...
mil_spyro
2 years, 11 months ago
Selected Answer: B
"After a year, you notice that your model's performance sometimes degrades significantly after a month, while other times it takes several months to notice any decrease in performance." Hence I vote B
upvoted 2 times
...

Topic 1 Question 138

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 138 discussion

You work for a company that manages a ticketing platform for a large chain of cinemas. Customers use a mobile app to search for movies they’re interested in and purchase tickets in the app. Ticket purchase requests are sent to Pub/Sub and are processed with a Dataflow streaming pipeline configured to conduct the following steps:
1. Check for availability of the movie tickets at the selected cinema.
2. Assign the ticket price and accept payment.
3. Reserve the tickets at the selected cinema.
4. Send successful purchases to your database.

Each step in this process has low latency requirements (less than 50 milliseconds). You have developed a logistic regression model with BigQuery ML that predicts whether offering a promo code for free popcorn increases the chance of a ticket purchase, and this prediction should be added to the ticket purchase process. You want to identify the simplest way to deploy this model to production while adding minimal latency. What should you do?

  • A. Run batch inference with BigQuery ML every five minutes on each new set of tickets issued.
  • B. Export your model in TensorFlow format, and add a tfx_bsl.public.beam.RunInference step to the Dataflow pipeline.
  • C. Export your model in TensorFlow format, deploy it on Vertex AI, and query the prediction endpoint from your streaming pipeline.
  • D. Convert your model with TensorFlow Lite (TFLite), and add it to the mobile app so that the promo code and the incoming request arrive together in Pub/Sub.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
behzadsw
Highly Voted 2 years, 4 months ago
Selected Answer: D
D as you want to do the prediction before the purchase
upvoted 11 times
...
hiromi
Highly Voted 2 years, 4 months ago
Selected Answer: D
D - https://www.tensorflow.org/lite/guide
upvoted 5 times
...
Fer660
Most Recent 2 months, 2 weeks ago
Selected Answer: B
Not A: we require real time inference Not C: adds network hops ==> latency Not D: we are required to choose a simple deployment to prod. Pushing the model to the devices is never simple IMHO. Much easier to have a centralized piece, as in B.
upvoted 1 times
...
Amer95
8 months, 1 week ago
Selected Answer: B
The incorrect answers introduce latency issues or operational inefficiencies: A: Running batch inference with BigQuery ML every five minutes causes delays due to interval-based processing. C: Deploying the model on Vertex AI introduces network latency from HTTP requests. D: Using TensorFlow Lite on mobile decentralizes inference but adds inconsistencies due to device variability and complicates updates. Correct Answer: B: Exporting the model in TensorFlow and integrating it into the Dataflow pipeline with tfx_bsl.public.beam.RunInference minimizes latency by keeping inference within the real-time streaming process. This ensures efficient and low-latency predictions.
upvoted 2 times
...
NamitSehgal
8 months, 3 weeks ago
Selected Answer: A
Near Real-Time is Sufficient D. Convert to TFLite and deploy to the mobile app: This is impractical due to data availability, model updates, privacy concerns, and likely introduces more latency than a BigQuery ML batch prediction.
upvoted 1 times
...
bobjr
11 months, 1 week ago
Selected Answer: C
D makes no senses -> if the prediction is made on the phone, why send it to the server ? C is the best choice because it splits the responsability, and use best practices and scalable tools
upvoted 2 times
...
omermahgoub
1 year ago
Selected Answer: B
B. Export your model in TensorFlow format, and add a tfx_bsl.public.beam.RunInference step to the Dataflow pipeline. Here's why this approach offers minimal latency: In-Pipeline Prediction: The model is integrated directly into the Dataflow pipeline, enabling real-time predictions for each ticket purchase request without external calls. Dataflow Integration: tfx_bsl.public.beam.RunInference is a Beam utility specifically designed for integrating TensorFlow models into Dataflow pipelines, ensuring efficient execution.
upvoted 2 times
...
Yan_X
1 year, 1 month ago
Selected Answer: B
B For D - How can we assume the model does be feasible to convert to Mobile app?
upvoted 1 times
...
Krish6488
1 year, 6 months ago
Selected Answer: D
Question looks ambiguous! However considering some keywords like low latency and more importantly ML usage for maximising the ticket purchase requests using the promo code means that model embedded to the device looks more appropriate, however there are a lot of downsides to it like model management and upgrades but that does not seem to be the consideration here anyway. Just looking at low latency and ML to maximise the ticket sales, I will go with D as thats much simpler to implement
upvoted 2 times
...
andresvelasco
1 year, 8 months ago
the whole question does not make too much sense to me. first of all, it seems that the Dataflow streaming job would "accept payment" meaning it communicates with payment gateways and back to the user, which does not sound right to do in Dataflow. the model "predicts whether offering a promo code for free popcorn increases the chance of a ticket purchase" is necesssarily executed before processing payment, so D seems the best. Awkward ....
upvoted 3 times
...
M25
2 years ago
Selected Answer: D
Went with D
upvoted 1 times
...
TNT87
2 years ago
Answer D
upvoted 4 times
...
TNT87
2 years, 1 month ago
Selected Answer: B
is the simplest way to deploy the logistic regression model to production with minimal latency. Exporting the model in TensorFlow format and adding a tfx_bsl.public.beam.RunInference step to the existing Dataflow pipeline enables the model to be integrated directly into the ticket purchase process.
upvoted 1 times
tavva_prudhvi
1 year, 9 months ago
would also not be suitable because adding a tfx_bsl.public.beam.RunInference step to the Dataflow pipeline would still require the model to be executed within the same pipeline, potentially introducing additional latency and computational overhead.
upvoted 1 times
...
...
TNT87
2 years, 2 months ago
Selected Answer: C
Option C is the best solution. Since the entire process has low latency requirements, running batch inference every five minutes is not a suitable option. Option B requires a TensorFlow model format, which may not be available since the model is created using BigQuery ML. Option D is not recommended because it requires deploying the model to the mobile app, which may not be feasible or desired. Deploying the model on Vertex AI and querying the prediction endpoint from the streaming pipeline adds minimal latency and is the simplest solution.
upvoted 1 times
TNT87
2 years, 1 month ago
Answer B
upvoted 1 times
...
TNT87
2 years, 2 months ago
Aiiii between B and C
upvoted 1 times
...
...
Scipione_
2 years, 2 months ago
Selected Answer: D
I perfectly agree with behzadsw. You send a Pub/Sub request when you already want to buy, you must add the coupon before this process.
upvoted 3 times
...
John_Pongthorn
2 years, 2 months ago
Selected Answer: B
B (is it possible) along with What I get fromthis question. 1. this prediction should be added to the ticket purchase process ( it mean that it have to be include in rocessed with a Dataflow streaming pipeline 2.Each step in this process has low latency requirements (less than 50 milliseconds) , which signifies that whatever you will process in dataflow, there are no latency requirements issues
upvoted 3 times
John_Pongthorn
2 years, 2 months ago
https://www.tensorflow.org/tfx/tfx_bsl/api_docs/python/tfx_bsl/public/beam/RunInference
upvoted 1 times
...
...
TNT87
2 years, 4 months ago
Answer D https://www.tensorflow. org/lite/guide
upvoted 1 times
TNT87
2 years, 2 months ago
Nope answer is B
upvoted 1 times
...
...

Topic 1 Question 139

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 139 discussion

You work on a team in a data center that is responsible for server maintenance. Your management team wants you to build a predictive maintenance solution that uses monitoring data to detect potential server failures. Incident data has not been labeled yet. What should you do first?

  • A. Train a time-series model to predict the machines’ performance values. Configure an alert if a machine’s actual performance values significantly differ from the predicted performance values.
  • B. Develop a simple heuristic (e.g., based on z-score) to label the machines’ historical performance data. Use this heuristic to monitor server performance in real time.
  • C. Develop a simple heuristic (e.g., based on z-score) to label the machines’ historical performance data. Train a model to predict anomalies based on this labeled dataset.
  • D. Hire a team of qualified analysts to review and label the machines’ historical performance data. Train a model based on this manually labeled dataset.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Mickey321
Highly Voted 1 year, 12 months ago
Selected Answer: B
Should be B
upvoted 6 times
Mickey321
1 year, 12 months ago
Compare to 94
upvoted 1 times
...
...
NamitSehgal
Most Recent 8 months, 3 weeks ago
Selected Answer: C
The model trained on heuristically labeled data can then be used to identify potential anomalies.
upvoted 2 times
...
rajshiv
11 months, 1 week ago
Selected Answer: C
The heuristic may work for monitoring in real time, but training a model on labeled data provides more accuracy over time as it adapts and improves. Simply using a heuristic to monitor the data does not allow for scalable automation of anomaly detection.
upvoted 4 times
...
omermahgoub
1 year, 7 months ago
Selected Answer: C
C. Develop a simple heuristic (e.g., based on z-score) to label the machines’ historical performance data. Train a model to predict anomalies based on this labeled dataset. Real-Time Heuristic Monitoring (Option B): Using a z-score based heuristic for real-time monitoring can be helpful as an initial step, but it might not capture complex anomaly patterns that a trained model could identify.
upvoted 2 times
pinimichele01
1 year, 6 months ago
"What should you do first?"....
upvoted 2 times
...
...
ludovikush
1 year, 7 months ago
Selected Answer: C
I go for C because is more practical and efficient
upvoted 2 times
...
pico
1 year, 11 months ago
Selected Answer: D
D: This approach involves creating a labeled dataset through human analysis, which serves as the ground truth for training a predictive maintenance model. Manual labeling allows you to identify instances of actual failures and non-failure states in the historical performance data. Once the dataset is labeled, you can train a machine learning model to detect patterns associated with potential server failures.
upvoted 1 times
pico
1 year, 11 months ago
why not B (or C): While heuristics can be quick to implement, they may lack accuracy and may not capture complex patterns associated with server failures. Additionally, using a heuristic alone might not provide the necessary foundation for a robust predictive maintenance model.
upvoted 1 times
Werner123
1 year, 8 months ago
Google Rules of ML: Rule #1: Don’t be afraid to launch a product without machine learning. https://developers.google.com/machine-learning/guides/rules-of-ml#rule_1_don%E2%80%99t_be_afraid_to_launch_a_product_without_machine_learning
upvoted 3 times
...
...
...
pico
1 year, 12 months ago
Selected Answer: C
Option B also falls short as it focuses on real-time monitoring based on a heuristic but doesn't utilize historical data to create a predictive model. This approach might raise false alarms and lacks the ability to learn from the data over time.
upvoted 2 times
...
Krish6488
2 years ago
Selected Answer: C
Clearly the ask is an approach to build an ML application to detect potential server failures. Using labelled data to monitor it in real time does not give a proactive solution rather it becomes a reactive solution. I will go with C
upvoted 2 times
Werner123
1 year, 8 months ago
It does not say use ML. It only says a predictive maintenance solution, that could be using a simple heuristic.
upvoted 2 times
...
...
M25
2 years, 6 months ago
Selected Answer: C
The goal / “your” task is to predict or “build a predictive maintenance solution”, i.e., “Train a model to predict anomalies” [Option C]; not to perform monitoring or “to monitor server performance in real time” [Option B], there is a whole team “responsible for server maintenance”. The “do first” part refers to the use of a simple heuristic for initial labeling, not to what to do with the results of it. The more sophisticated solution: https://cloud.google.com/blog/products/ai-machine-learning/event-monitoring-with-explanations-on-the-google-cloud.
upvoted 2 times
M25
2 years, 6 months ago
Changed to B, based on the comparison with #94, assuming that by “Use this heuristic to monitor server performance in real time” is meant to “first” test this heuristic for labelling in a Prod. environment, as a quick reality-check, before training a whole model on a roughly inaccurate labelled dataset.
upvoted 3 times
pico
1 year, 11 months ago
why do you assume that this needs to be done "quick" instead of "good"?
upvoted 1 times
...
...
...
TNT87
2 years, 7 months ago
Selected Answer: B
ANSWER B
upvoted 2 times
...
osaka_monkey
2 years, 8 months ago
Selected Answer: B
https://developers.google.com/machine-learning/guides/rules-of-ml
upvoted 3 times
YushiSato
1 year, 3 months ago
https://developers.google.com/machine-learning/guides/rules-of-ml#rule_1_don%E2%80%99t_be_afraid_to_launch_a_product_without_machine_learning > Rule #1: Don’t be afraid to launch a product without machine learning. > Machine learning is cool, but it requires data. Theoretically, you can take data from a different problem and then tweak the model for a new product, but this will likely underperform basic heuristics. If you think that machine learning will give you a 100% boost, then a heuristic will get you 50% of the way there. > For instance, if you are ranking apps in an app marketplace, you could use the install rate or number of installs as heuristics. If you are detecting spam, filter out publishers that have sent spam before. Don’t be afraid to use human editing either. If you need to rank contacts, rank the most recently used highest (or even rank alphabetically). If machine learning is not absolutely required for your product, don't use it until you have data.
upvoted 1 times
...
...
TNT87
2 years, 8 months ago
Selected Answer: D
D. Hire a team of qualified analysts to review and label the machines' historical performance data. Training a model based on this manually labeled dataset would be the most accurate and effective approach. Developing a simple heuristic to label the machines' historical performance data may not be accurate enough to detect all potential failures, and training a model without labeled data could result in poor performance. Additionally, it's important to ensure that the team of analysts is qualified and experienced in labeling this type of data accurately to ensure the model is trained with high-quality labeled data.
upvoted 1 times
TNT87
2 years, 7 months ago
Answer B
upvoted 1 times
...
...
Scipione_
2 years, 8 months ago
Selected Answer: C
I like this question because it's helpful to remember that ML is used when needed. In this case you have unlabeled target classes so you can use unsupervised learning techniques like clustering to identify patterns or just develop a heuristic method. Answer 'B' in my opinion.
upvoted 1 times
Scipione_
2 years, 8 months ago
sorry I meant 'B'
upvoted 1 times
...
...
TNT87
2 years, 10 months ago
Selected Answer: B
https://developers. google.com/machine- learning/ guides/rules-of-ml Answer B
upvoted 2 times
...
hiromi
2 years, 10 months ago
Selected Answer: B
B - https://developers.google.com/machine-learning/guides/rules-of-ml
upvoted 2 times
...
mymy9418
2 years, 10 months ago
Selected Answer: B
B should be first to do
upvoted 1 times
...
mil_spyro
2 years, 11 months ago
Selected Answer: D
Vote D
upvoted 1 times
mil_spyro
2 years, 10 months ago
Should be B
upvoted 2 times
...
...

Topic 1 Question 140

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 140 discussion

You work for a retailer that sells clothes to customers around the world. You have been tasked with ensuring that ML models are built in a secure manner. Specifically, you need to protect sensitive customer data that might be used in the models. You have identified four fields containing sensitive data that are being used by your data science team: AGE, IS_EXISTING_CUSTOMER, LATITUDE_LONGITUDE, and SHIRT_SIZE. What should you do with the data before it is made available to the data science team for training purposes?

  • A. Tokenize all of the fields using hashed dummy values to replace the real values.
  • B. Use principal component analysis (PCA) to reduce the four sensitive fields to one PCA vector.
  • C. Coarsen the data by putting AGE into quantiles and rounding LATITUDE_LONGTTUDE into single precision. The other two fields are already as coarse as possible.
  • D. Remove all sensitive data fields, and ask the data science team to build their models using non-sensitive data.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Scipione_
Highly Voted 2 years, 8 months ago
Selected Answer: A
B -> possible in general but not suitable in this case since you don't know AGE, IS_EXISTING_CUSTOMER, LATITUDE_LONGITUDE, and SHIRT_SIZE are the first components in PCA. C -> You are changing data which could be highly correlated with the output D -> like C explanation Answer 'A' uses hashing so you encript the data without losing relevant information
upvoted 5 times
...
desertlotus1211
Most Recent 8 months, 1 week ago
Selected Answer: C
Answer A will strip out the ordinal or numerical relationships present in the data, which can be crucial for model performance
upvoted 2 times
...
phani49
10 months, 3 weeks ago
Selected Answer: C
AGE into Quantiles: • Age is a continuous variable and highly sensitive. Converting it into quantiles (e.g., age ranges) reduces granularity and protects individual privacy while preserving utility for modeling. • Rounding LATITUDE_LONGITUDE: • Latitude and longitude provide precise location information, which can lead to privacy risks. Rounding to single precision (e.g., reducing decimal places) anonymizes the data while retaining geographical relevance for modeling. 2. Existing Fields: • IS_EXISTING_CUSTOMER and SHIRT_SIZE: • These fields are already coarse and unlikely to reveal sensitive information directly (e.g., boolean for IS_EXISTING_CUSTOMER and categorical for SHIRT_SIZE), so no further processing is required.
upvoted 3 times
...
b7ef5e3
11 months, 3 weeks ago
Selected Answer: C
Between A and C, however A would not work well for linear data like age and long/lat. By hashing you are creating discrete categories rather than linear ones, making it difficult to find trends from other data. A may be more practical of a decision if they incorporated binning or something beforehand.
upvoted 2 times
...
bobjr
1 year, 5 months ago
Selected Answer: C
The best approach is C. Coarsen the data by putting AGE into quantiles and rounding LATITUDE_LONGITUDE into single precision. The other two fields are already as coarse as possible. Here's why: Preserves Utility: Coarsening the data reduces its sensitivity while retaining some of its informational value for modeling. Age quantiles and approximate location can still be useful features for certain types of models. Minimizes Risk: By removing the exact age and precise location, you significantly reduce the risk of re-identification or misuse of sensitive information. Practicality: Coarsening is a relatively simple technique to implement and doesn't require complex transformations or additional model training. pen_spark
upvoted 4 times
...
pico
1 year, 11 months ago
Selected Answer: D
This approach involves not providing the sensitive fields (AGE, IS_EXISTING_CUSTOMER, LATITUDE_LONGITUDE, and SHIRT_SIZE) to the data science team for model training. Instead, the team can focus on building models using non-sensitive data. This helps to mitigate the risk of exposing sensitive customer information during the development and training process. While options A, B, and C propose different methods of obfuscating or transforming the sensitive data, they may introduce complexities and potential risks. Tokenizing with hashed dummy values (option A) may not be foolproof in terms of security, and PCA (option B) may not effectively retain the necessary information for accurate modeling. Coarsening the data (option C) might still retain some level of identifiable information, and it may not be sufficient for ensuring the privacy of sensitive data.
upvoted 1 times
LFavero
1 year, 8 months ago
why would you remove potential important features from the training?
upvoted 3 times
...
...
M25
2 years, 6 months ago
Selected Answer: A
Went with A
upvoted 3 times
...
TNT87
2 years, 8 months ago
Selected Answer: D
D. Remove all sensitive data fields, and ask the data science team to build their models using non-sensitive data. This is the best approach to protect sensitive customer data. Removing the sensitive fields is the most secure option because it eliminates the risk of any potential data breaches. Tokenizing or coarsening the data may still reveal sensitive information if the hashed dummy values can be reversed or if the coarsening can be used to identify individual customers. PCA can also be a useful technique to reduce dimensionality and protect privacy, but it may not be appropriate in this case because it is not clear how the sensitive fields can be combined into a single PCA vector without losing information.
upvoted 1 times
tavva_prudhvi
2 years, 3 months ago
Removing all sensitive data fields (Option D) would likely limit the effectiveness of the machine learning model, as important predictive variables would be excluded from the training process. It is important to balance privacy considerations with the need to train accurate models that can provide valuable insights and predictions.
upvoted 1 times
pico
1 year, 11 months ago
But in option A, Hashing can result in information loss. While the original values are hidden, the hashed values might not retain the same level of information, which can impact the effectiveness of the machine learning models.
upvoted 1 times
...
...
...
ares81
2 years, 10 months ago
Selected Answer: A
Hashing --> A
upvoted 4 times
...
TNT87
2 years, 10 months ago
Selected Answer: A
Answer A
upvoted 3 times
...
hiromi
2 years, 10 months ago
Selected Answer: A
A (by experience)
upvoted 3 times
hiromi
2 years, 10 months ago
https://cloud.google.com/blog/products/identity-security/take-charge-of-your-data-how-tokenization-makes-data-usable-without-sacrificing-privacy
upvoted 4 times
...
...
mymy9418
2 years, 10 months ago
Selected Answer: A
I think hash should be better
upvoted 2 times
...
mil_spyro
2 years, 11 months ago
Selected Answer: D
Removing the sensitive data fields is the safest and most effective way to ensure that customer data is not used in the training of your models.
upvoted 3 times
hiromi
2 years, 10 months ago
see https://cloud.google.com/blog/products/identity-security/take-charge-of-your-data-how-tokenization-makes-data-usable-without-sacrificing-privacy
upvoted 1 times
...
tavva_prudhvi
2 years, 3 months ago
Removing all sensitive data fields (Option D) would likely limit the effectiveness of the machine learning model, as important predictive variables would be excluded from the training process. It is important to balance privacy considerations with the need to train accurate models that can provide valuable insights and predictions.
upvoted 1 times
...
...

Topic 1 Question 141

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 141 discussion

You work for a magazine publisher and have been tasked with predicting whether customers will cancel their annual subscription. In your exploratory data analysis, you find that 90% of individuals renew their subscription every year, and only 10% of individuals cancel their subscription. After training a NN Classifier, your model predicts those who cancel their subscription with 99% accuracy and predicts those who renew their subscription with 82% accuracy. How should you interpret these results?

  • A. This is not a good result because the model should have a higher accuracy for those who renew their subscription than for those who cancel their subscription.
  • B. This is not a good result because the model is performing worse than predicting that people will always renew their subscription.
  • C. This is a good result because predicting those who cancel their subscription is more difficult, since there is less data for this group.
  • D. This is a good result because the accuracy across both groups is greater than 80%.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
TNT87
Highly Voted 2 years, 10 months ago
Selected Answer: C
Answer C
upvoted 8 times
...
Voyager2
Highly Voted 2 years, 5 months ago
Selected Answer: C
Went with C: This is a good result because predicting those who cancel their subscription is more difficult, since there is less data for this group My Reason: "You have been tasked with predicting whether customers will cancel their annual subscription." And in that task you are getting 99% of accuracy
upvoted 5 times
...
4d742d7
Most Recent 5 months ago
Selected Answer: B
In imbalanced classification problems (like this one, where 90% renew and 10% cancel), accuracy alone is misleading. Here's why: Let's break it down: Suppose you always predict "renew" (i.e., the majority class). You’d be right 90% of the time. That’s already 90% accuracy with no model at all.
upvoted 2 times
...
desertlotus1211
8 months, 1 week ago
Selected Answer: B
Because 90% of subscribers renew, a simple baseline that always predicts "renew" would achieve 90% accuracy overall. In your model, while the cancellation (minority class) accuracy is high at 99%, the accuracy for renewals (the majority class) is only 82%. This means that when predicting the renewals, the model is doing significantly worse than the baseline of always predicting renewal.
upvoted 1 times
...
Amer95
8 months, 1 week ago
Selected Answer: B
Correct Answer: This is not a good result because the model is performing worse than a simple heuristic of predicting that everyone will renew. Why? • A simple heuristic (predicting all renewals) achieves 90% accuracy. • The model’s 82% accuracy for renewals is worse than this baseline, meaning it adds no real value. • The 99% accuracy for cancellations is misleading, likely due to severe class imbalance. • Alternative metrics (precision, recall, F1-score) would provide a clearer picture of actual performance.
upvoted 1 times
...
phani49
10 months, 3 weeks ago
Selected Answer: B
Option B: "This is not a good result because the model is performing worse than predicting that people will always renew their subscription." Note: In practice, you'd also consider precision, recall, and business objectives. For example, if your business goal is to identify and retain canceling customers before they leave, a model with higher recall for cancellations might be beneficial despite the lower overall accuracy. But within the context of the given multiple-choice answers and the question's framing, B is the correct interpretation.
upvoted 1 times
...
lunalongo
11 months, 1 week ago
Selected Answer: B
B) - Saying "good result" is premature: no precision, recall, F1 was given - High accuracy over the minority could be overfitting - If the model predicted always renewal, it would achieve 90% accuracy. - 82% accuracy for renewals shows it isn't much better than a naive prediction.
upvoted 1 times
...
f084277
12 months ago
Selected Answer: C
C. The goal of the model is to predict CANCELLATIONS, not renewals
upvoted 1 times
...
pico
1 year, 11 months ago
Selected Answer: B
Here's the reasoning: The overall renewal rate is 90%, meaning that if the model simply predicted that everyone would renew, it would have an accuracy of 90%. The model's accuracy for predicting renewals (82%) is lower than this baseline accuracy. The model's accuracy for predicting cancellations is high (99%), but this could be misleading. If only 10% of individuals cancel their subscription, a model that predicts no cancellations at all would still have a high accuracy of 90%. Therefore, the high accuracy for cancellations may not be very informative. In summary, the model is not performing well, especially when compared to a simple baseline of always predicting renewals.
upvoted 4 times
pico
1 year, 11 months ago
C suggests that predicting cancellations is more difficult due to less data for this group. While it's true that imbalanced datasets, where one class is underrepresented, can pose challenges for machine learning models, the key issue here is that the model's accuracy for predicting renewals is lower than the accuracy for predicting cancellations. In this scenario, the imbalance alone does not explain the lower accuracy for renewals. The model should ideally perform well on both classes, and the fact that it doesn't, especially when compared to a simple baseline of always predicting renewals (which would have an accuracy of 90%), suggests that there's a problem with the model's performance. Therefore, option B is a more appropriate interpretation, highlighting that the model is performing worse than a basic strategy of always predicting renewals.
upvoted 2 times
...
ccb23cc
1 year, 5 months ago
if we suppose the case where the model simply predicted that everyone would renew, the renewals rate should be always higher than the cancellations. Therefore, This case means that the model made some asumptions about how a cancellation looks like and misled some of the renewals cases (it could make some wrong asumptions because there are few data)
upvoted 1 times
...
...
tavva_prudhvi
2 years, 4 months ago
Selected Answer: C
since there is less data for this group. While the accuracy for predicting subscription renewals is lower, it is still above chance and may still be useful. Additionally, the high accuracy for predicting cancellations is promising, as this is the group of interest for the publisher. However, it would still be important to assess the model's precision and recall to fully evaluate its performance.
upvoted 2 times
...
ShePiDai
2 years, 5 months ago
Selected Answer: B
Task is to predict whether customer will cancel subscription, so both renew and cancel predictions are important. The overall accuracy is 99% x 10% + 82% x 90% = 83%, while guessing always renew has 90% accuracy.
upvoted 2 times
...
M25
2 years, 6 months ago
Selected Answer: B
#ResponsibleAI, predicting the majority class (imbalanced data) topic: “the model [82% accuracy for renew] is performing worse than predicting that people will always [90% accuracy] renew their subscription”. https://developers.google.com/machine-learning/crash-course/classification/check-your-understanding-accuracy-precision-recall “A deadly, but curable, medical condition afflicts .01% of the population. An ML model (…) predicts (…) with an accuracy of 99.99%. (…) After all, even a "dumb" model that always predicts "not sick" would still be 99.99% accurate.“
upvoted 2 times
...
lucaluca1982
2 years, 6 months ago
Selected Answer: B
The 82% accuracy for renewals is lower than a naive model that always predicts renewals (which would have a 90% accuracy).
upvoted 1 times
f084277
12 months ago
The goal of the model is to predict cancellations, not renewals...
upvoted 1 times
...
...
Scipione_
2 years, 8 months ago
Selected Answer: C
I think C is the only way
upvoted 3 times
...
John_Pongthorn
2 years, 8 months ago
Selected Answer: C
We can consider it as follows reasonably. A: it doesn't make any sense, given that cancel=99% but renew =82%, how did you make renew class (82%) beat the Cancel class(99%), it must be 100% accuracy ( bullshit) B: Cancel class have more accuracy than renew (99%>82%) D: You can justify, both are good 80% if we have a balance class. So it left us with C. This model predicts well upon imbalanced class circumstances. target class =10 samles meanwhile the another =90 samples
upvoted 2 times
...
ares81
2 years, 10 months ago
Selected Answer: C
Logically, it should be C.
upvoted 2 times
...
Dataspire
2 years, 10 months ago
Selected Answer: A
Since 90% of dataset represent customer who will renew subscription, accuracy should have been greater than 82%
upvoted 1 times
John_Pongthorn
2 years, 8 months ago
I think C is the most likely. we are experiencing an imbalanced dataset and the target class is canceled. Actually this case we have to use other metrics like F1 and precision/recall If you want to get ReNew accuracy higher than CAncel, you have to make it greater than 99% as compared to another. it is hard to archive.
upvoted 1 times
...
...

Topic 1 Question 142

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 142 discussion

You have built a model that is trained on data stored in Parquet files. You access the data through a Hive table hosted on Google Cloud. You preprocessed these data with PySpark and exported it as a CSV file into Cloud Storage. After preprocessing, you execute additional steps to train and evaluate your model. You want to parametrize this model training in Kubeflow Pipelines. What should you do?

  • A. Remove the data transformation step from your pipeline.
  • B. Containerize the PySpark transformation step, and add it to your pipeline.
  • C. Add a ContainerOp to your pipeline that spins a Dataproc cluster, runs a transformation, and then saves the transformed data in Cloud Storage.
  • D. Deploy Apache Spark at a separate node pool in a Google Kubernetes Engine cluster. Add a ContainerOp to your pipeline that invokes a corresponding transformation job for this Spark instance.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
mil_spyro
Highly Voted 2 years, 11 months ago
Selected Answer: C
This will allow to reuse the same pipeline for different datasets without the need to manually preprocess and transform the data each time.
upvoted 8 times
...
tavva_prudhvi
Highly Voted 2 years, 4 months ago
Selected Answer: C
Since the data is stored in Parquet format, it's more efficient to use Spark to transform it. Containerizing the PySpark transformation step and adding it to the pipeline may not be the optimal solution since it may require additional resources to run this container. Deploying Apache Spark at a separate node pool in a Google Kubernetes Engine cluster and adding a ContainerOp to invoke a corresponding transformation job for this Spark instance is also a possible solution, but it may require more setup and configuration. Using Dataproc can simplify this process since it's a fully managed service that simplifies running Apache Spark and Hadoop clusters. A ContainerOp can be added to the pipeline to spin up a Dataproc cluster, run the transformation using PySpark, and save the transformed data in Cloud Storage. This solution is more efficient since Dataproc can scale the cluster based on the size of the data and the complexity of the transformation.
upvoted 7 times
...
momosoundz
Most Recent 2 years, 4 months ago
Selected Answer: B
you can conteinerize the transformation and then save to google storage
upvoted 2 times
tavva_prudhvi
2 years, 3 months ago
it is not the most efficient and scalable solution when working with big data in the context of Google Cloud.
upvoted 1 times
...
...
M25
2 years, 6 months ago
Selected Answer: C
https://kubeflow-pipelines.readthedocs.io/en/stable/source/kfp.dsl.html#kfp.dsl.ContainerOp https://medium.com/@vignesh093/running-preprocessing-and-ml-workflow-in-kubeflow-with-google-dataproc-84103a9ef67e
upvoted 1 times
...
TNT87
2 years, 8 months ago
Selected Answer: C
C. Add a ContainerOp to your pipeline that spins a Dataproc cluster, runs a transformation, and then saves the transformed data in Cloud Storage. The recommended approach to parametrize the model training in Kubeflow Pipelines would be to add a ContainerOp to the pipeline that spins up a Dataproc cluster, runs the PySpark transformation step, and saves the transformed data in Cloud Storage. This approach allows for easy integration of PySpark transformations with Kubeflow Pipelines while taking advantage of the scalability and efficiency of Dataproc.
upvoted 2 times
...
chidstar
2 years, 8 months ago
Selected Answer: B
All the wrong answers on this site really baffle me...correct answer is B... you must containerize your component for Kubeflow to run it. https://www.kubeflow.org/docs/components/pipelines/v1/sdk/component-development/#containerize-your-components-code
upvoted 6 times
TNT87
2 years, 8 months ago
C. Add a ContainerOp to your pipeline that spins a Dataproc cluster, runs a transformation, and then saves the transformed data in Cloud Storage. The recommended approach to parametrize the model training in Kubeflow Pipelines would be to add a ContainerOp to the pipeline that spins up a Dataproc cluster, runs the PySpark transformation step, and saves the transformed data in Cloud Storage. This approach allows for easy integration of PySpark transformations with Kubeflow Pipelines while taking advantage of the scalability and efficiency of Dataproc.
upvoted 3 times
...
f084277
12 months ago
The doc you linked literally says to use ContainerOp in the documentation. The answer is C.
upvoted 1 times
...
...
TNT87
2 years, 10 months ago
Selected Answer: C
Answer C
upvoted 2 times
...

Topic 1 Question 143

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 143 discussion

You have developed an ML model to detect the sentiment of users’ posts on your company's social media page to identify outages or bugs. You are using Dataflow to provide real-time predictions on data ingested from Pub/Sub. You plan to have multiple training iterations for your model and keep the latest two versions live after every run. You want to split the traffic between the versions in an 80:20 ratio, with the newest model getting the majority of the traffic. You want to keep the pipeline as simple as possible, with minimal management required. What should you do?

  • A. Deploy the models to a Vertex AI endpoint using the traffic-split=0=80, PREVIOUS_MODEL_ID=20 configuration.
  • B. Wrap the models inside an App Engine application using the --splits PREVIOUS_VERSION=0.2, NEW_VERSION=0.8 configuration
  • C. Wrap the models inside a Cloud Run container using the REVISION1=20, REVISION2=80 revision configuration.
  • D. Implement random splitting in Dataflow using beam.Partition() with a partition function calling a Vertex AI endpoint.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
TNT87
Highly Voted 2 years, 2 months ago
Selected Answer: A
A. Deploy the models to a Vertex AI endpoint using the traffic-split=0=80, PREVIOUS_MODEL_ID=20 configuration. The recommended approach to achieve the desired outcome would be to deploy the ML models to a Vertex AI endpoint and configure the traffic splitting using the traffic-split parameter. The traffic-split parameter enables you to split traffic between multiple versions of a model based on a percentage split. In this case, the newest model should receive the majority of the traffic, which can be achieved by setting the traffic-split parameter to 0=80. The previous version of the model should receive the remaining 20% of the traffic, which can be achieved by setting the PREVIOUS_MODEL_ID parameter to 20.
upvoted 9 times
...
fitri001
Most Recent 1 year ago
Selected Answer: A
Vertex AI Traffic Splitting: Vertex AI natively supports traffic splitting between deployed models through the traffic-split parameter. This allows you to specify the desired traffic distribution (80% to the newest model, 20% to the previous one) during deployment.
upvoted 2 times
...
M25
2 years ago
Selected Answer: A
Went with A
upvoted 2 times
...
FherRO
2 years, 2 months ago
Selected Answer: A
I think is A because traffic can be split across different versions when using Endpoints https://cloud.google.com/vertex-ai/docs/general/deployment#models-endpoint. The --trafic-split flag does exist, but in the question the syntax is incorrect, it should be "--traffic-split = [MODEL_ID_1=value, MODEL_ID_2=value]" as explained in https://cloud.google.com/sdk/gcloud/reference/ai/endpoints/deploy-model
upvoted 4 times
...

Topic 1 Question 144

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 144 discussion

You are developing an image recognition model using PyTorch based on ResNet50 architecture. Your code is working fine on your local laptop on a small subsample. Your full dataset has 200k labeled images. You want to quickly scale your training workload while minimizing cost. You plan to use 4 V100 GPUs. What should you do?

  • A. Create a Google Kubernetes Engine cluster with a node pool that has 4 V100 GPUs. Prepare and submit a TFJob operator to this node pool.
  • B. Create a Vertex AI Workbench user-managed notebooks instance with 4 V100 GPUs, and use it to train your model.
  • C. Package your code with Setuptools, and use a pre-built container. Train your model with Vertex AI using a custom tier that contains the required GPUs.
  • D. Configure a Compute Engine VM with all the dependencies that launches the training. Train your model with Vertex AI using a custom tier that contains the required GPUs.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
John_Pongthorn
Highly Voted 2 years, 2 months ago
Selected Answer: C
Custom trainer , don't overthink 1000%, this is google recommendation. you don't need Vertex AI Workbench user-managed notebooks,Google Kubernetes Engine, Compute Engine at at all , it is a waste of your effort https://cloud.google.com/vertex-ai/docs/training/configure-compute#specifying_gpus You can choose as your want
upvoted 12 times
...
pawan94
Most Recent 1 year ago
Why in the world would you setup a Compute engine VM, when your custom training job on vertex runs "serverless" atleast for the user side don't have to maintain the vm. You literally just have to select the region , machine type and accelerators that's all.
upvoted 1 times
...
fitri001
1 year ago
Selected Answer: C
Pre-built container: Utilizing a pre-built PyTorch container image eliminates the need to manage dependencies within your container, saving time and simplifying the process. Vertex AI custom tier: Vertex AI custom tiers allow you to configure a machine type with the desired GPUs (4 V100 in this case) and pay only for the resources you use. This is more cost-effective than managing a dedicated VM instance. Setuptools packaging: Packaging your code with tools like Setuptools ensures all necessary libraries and scripts are included within the container, creating a self-contained training environment.
upvoted 4 times
...
Mickey321
1 year, 6 months ago
Selected Answer: C
Using Vertex AI allows you to easily leverage multiple GPUs without managing infrastructure yourself. The custom tier gives you control to specify 4 V100 GPUs. Packaging with Setuptools and using a pre-built container ensures a consistent and portable environment with all dependencies readily available. This approach minimizes overhead and cost by relying on Vertex AI's managed service instead of setting up your own Kubernetes cluster or VMs.
upvoted 1 times
...
PST21
1 year, 9 months ago
Correct Ans is C. Below mentioned why B is incorrect.
upvoted 1 times
...
PST21
1 year, 9 months ago
Selected Answer: B
Option B (using a Vertex AI Workbench user-managed notebooks instance with 4 V100 GPUs) is more suitable for interactive data exploration and experimentation rather than large-scale model training. Vertex AI Workbench is designed for collaborative data science, but using it for model training might not be the most efficient approach.
upvoted 1 times
...
julliet
1 year, 11 months ago
What is Supetools?
upvoted 1 times
julliet
1 year, 11 months ago
Found it. Python package that provides a mechanism for packaging, distributing, and installing Python libraries or modules.
upvoted 1 times
...
...
M25
2 years ago
Selected Answer: C
“Vertex AI provides flexible and scalable hardware and secured infrastructure to train PyTorch based deep learning models with pre-built containers and custom containers. (…) use PyTorch ResNet-50 as the example model and train it on ImageNet validation data (50K images) to measure the training performance for different training strategies”: https://cloud.google.com/blog/products/ai-machine-learning/efficient-pytorch-training-with-vertex-ai
upvoted 1 times
M25
2 years ago
There is no indication otherwise why there would be a need for full control over the environment, provided by “user-managed workbooks” within the Vertex AI Workbench [Option B], except for the “plan to use 4 V100 GPUs”, but one can do that with “managed workbooks” as well: https://cloud.google.com/vertex-ai/docs/workbench/notebook-solution#control_your_hardware_and_framework_from_jupyterlab
upvoted 1 times
...
...
frangm23
2 years ago
Can someone explain why is B wrong?
upvoted 2 times
andresvelasco
1 year, 8 months ago
Very likely because of the consideration: "You want to quickly scale your training workload while minimizing cost" But I agfree with you ... I chose B (notebook) thinking the question was more oriented to haquickly achieving an MVP.
upvoted 1 times
...
...
TNT87
2 years ago
Selected Answer: C
Answer C Option A involves using Google Kubernetes Engine, which is a platform for deploying, managing, and scaling containerized applications. However, it requires more setup time and knowledge of Kubernetes, which might not be ideal for quickly scaling up training workloads. Furthermore, the use of the TensorFlow Job operator seems inappropriate for a PyTorch-based model.
upvoted 1 times
...
wlts
2 years, 1 month ago
Select C
upvoted 1 times
wlts
2 years, 1 month ago
The TFJob operator is designed for TensorFlow workloads, not PyTorch. So option A is out. Vertex AI Workbench is primarily designed for interactive work with Jupyter Notebooks and not optimized for large-scale, long-running model training. Moreover, it may not provide the same level of cost optimization as Vertex AI Training, which automatically provisions and manages resources, and can scale down when not in use. So option B also out.
upvoted 1 times
...
...
TNT87
2 years, 2 months ago
C. Package your code with Setuptools, and use a pre-built container. Train your model with Vertex AI using a custom tier that contains the required GPUs. The recommended approach to scale the training workload while minimizing cost would be to package the code with Setuptools and use a pre-built container, then train the model with Vertex AI using a custom tier that contains the required GPUs. This approach allows for quick and easy scaling of the training workload while minimizing infrastructure management costs.
upvoted 1 times
...
FherRO
2 years, 2 months ago
Selected Answer: A
Vote for A, as you need to scale
upvoted 1 times
alelamb
2 years, 2 months ago
It clearly says a Pytorch model, you cannot use a TFjob
upvoted 2 times
...
...
Scipione_
2 years, 2 months ago
Selected Answer: B
It's B according to me, since VertexAI Notebook has alla dependencies for PyTorch that is the fastest solution
upvoted 2 times
tavva_prudhvi
1 year, 9 months ago
It involves using a managed notebooks instance, which might have limitations in terms of customizability and flexibility compared to a containerized approach.
upvoted 1 times
...
...
TNT87
2 years, 3 months ago
Selected Answer: A
Google Kubernetes Engine (GKE) is a powerful and easy-to-use platform for deploying and managing containerized applications. It allows you to create a cluster of virtual machines that are pre-configured with the necessary dependencies and resources to run your machine learning workloads. By creating a GKE cluster with a node pool that has 4 V100 GPUs, you can take advantage of the powerful processing capabilities of these GPUs to train your model quickly and efficiently. You can then use the Kubernetes Framework such as TFJob operator to submit the job of training your model, which will automatically distribute the workload across the available GPUs. References: Google Kubernetes Engine TFJob operator Vertex Al
upvoted 2 times
alelamb
2 years, 2 months ago
It clearly says a Pytorch model, you cannot use a TFjob
upvoted 1 times
...
TNT87
2 years, 2 months ago
Answer C
upvoted 2 times
...
...

Topic 1 Question 145

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 145 discussion

You have trained a DNN regressor with TensorFlow to predict housing prices using a set of predictive features. Your default precision is tf.float64, and you use a standard TensorFlow estimator:



Your model performs well, but just before deploying it to production, you discover that your current serving latency is 10ms @ 90 percentile and you currently serve on CPUs. Your production requirements expect a model latency of 8ms @ 90 percentile. You're willing to accept a small decrease in performance in order to reach the latency requirement.
Therefore your plan is to improve latency while evaluating how much the model's prediction decreases. What should you first try to quickly lower the serving latency?

  • A. Switch from CPU to GPU serving.
  • B. Apply quantization to your SavedModel by reducing the floating point precision to tf.float16.
  • C. Increase the dropout rate to 0.8 and retrain your model.
  • D. Increase the dropout rate to 0.8 in _PREDICT mode by adjusting the TensorFlow Serving parameters.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Tayoso
Highly Voted 1 year, 10 months ago
Selected Answer: B
Switching from CPU to GPU serving could also improve latency, but it may not be considered a "quick" solution compared to model quantization because it involves additional hardware requirements and potentially more complex deployment changes. Additionally, not all models see a latency improvement on GPUs, especially if the model is not large enough to utilize the GPU effectively or if the infrastructure does not support GPU optimizations. Therefore, the first thing to try would be quantization, which can be done relatively quickly and directly within the TensorFlow framework. After applying quantization, you should evaluate the model to ensure that the decrease in precision does not lead to an unacceptable drop in prediction accuracy.
upvoted 6 times
...
fitri001
Highly Voted 1 year, 6 months ago
Selected Answer: B
Reduced model size: Quantization reduces the model size by using lower precision data types like tf.float16 instead of the default tf.float64. This smaller size leads to faster loading and processing during inference. Minimal performance impact: Quantization often introduces a small decrease in model accuracy, but it's a good initial step to explore due to the potential latency gains with minimal performance trade-offs.
upvoted 5 times
...
0e6b9e2
Most Recent 10 months, 2 weeks ago
Selected Answer: A
B is wrong because tf.float16 quantization requires GPU accelerators; this is CPU only A isn't a great answer but it's the next best choice https://ai.google.dev/edge/litert/models/post_training_quantization#float16_quantization
upvoted 1 times
...
baimus
1 year, 2 months ago
Selected Answer: B
I know the answer is B becaue the question is telegraphing it so much: "You can lower quality a bit" (waggles eyebrows) - that obviously means quantizing (the other changes are silly). But in reality A would be much more normal thing to do. It's unusual to even attempt serving an NN on CPU these days.
upvoted 2 times
...
gscharly
1 year, 7 months ago
Selected Answer: B
I went with B.
upvoted 1 times
...
Carlose2108
1 year, 8 months ago
Selected Answer: B
I went with B.
upvoted 1 times
...
Mickey321
1 year, 12 months ago
Selected Answer: A
Very confusing A or B but leaning to A
upvoted 1 times
Mickey321
1 year, 12 months ago
Changed to B
upvoted 2 times
...
...
andresvelasco
2 years, 2 months ago
Selected Answer: B
B based on the consideration: "Therefore your plan is to improve latency while evaluating how much the model's prediction decreases"
upvoted 2 times
...
Voyager2
2 years, 5 months ago
Selected Answer: B
To me is B. Apply quantization to your SavedModel by reducing the floating point precision to tf.float16. Obviously that switthing to GPU improve latency BUT.... it says "Therefore your plan is to improve latency while evaluating how much the model's prediction decreases." If you want evaluate how much decrease is because you are going to make changes that affect the prediction
upvoted 4 times
julliet
2 years, 5 months ago
according to the documentation we have to convert to TensorFlowLite before applying quantization or use an API https://www.tensorflow.org/model_optimization/guide/quantization/training doesn't look to be first option. Second maybe?
upvoted 2 times
...
...
Voyager2
2 years, 5 months ago
Selected Answer: A
Going with A: My reason to discard B: from https://www.tensorflow.org/lite/performance/post_training_quantization#float16_quantization The advantages of float16 quantization are as follows: It reduces model size by up to half (since all weights become half of their original size). It causes minimal loss in accuracy. It supports some delegates (e.g. the GPU delegate) which can operate directly on float16 data, resulting in faster execution than float32 computations. The disadvantages of float16 quantization are as follows: It does not reduce latency as much as a quantization to fixed point math. By default, a float16 quantized model will "dequantize" the weights values to float32 when run on the CPU. (Note that the GPU delegate will not perform this dequantization, since it can operate on float16 data.)
upvoted 1 times
...
aryaavinash
2 years, 5 months ago
Going with B because quantization can reduce the model size and inference latency by using lower-precision arithmetic operations, while maintaining acceptable accuracy. The other options are either not feasible or not effective for lowering the serving latency. Switching from CPU to GPU serving may not be possible or cost-effective, increasing the dropout rate may degrade the model performance significantly, and dropout is not applied in _PREDICT mode by default.
upvoted 4 times
...
M25
2 years, 6 months ago
Selected Answer: A
For tf.float16 [Option B], we would have to be on TFLite: https://discuss.tensorflow.org/t/convert-tensorflow-saved-model-from-float32-to-float16/12130 and resp. https://www.tensorflow.org/lite/performance/post_training_quantization#float16_quantization (plus “By default, a float16 quantized model will "dequantize" the weights values to float32 when run on the CPU. (Note that the GPU delegate will not perform this dequantization, since it can operate on float16 data.)”
upvoted 2 times
M25
2 years, 6 months ago
But even before that, tf.estimator.DNNRegressor is deprecated, “Use tf.keras instead”: https://www.tensorflow.org/api_docs/python/tf/estimator/DNNRegressor. When used with Keras (a high-level NN library that runs on top of TF), for training though, “It is not recommended to set this to float16 for training, as this will likely cause numeric stability issues. Instead, mixed precision, which is using a mix of float16 and float32, can be used”: https://www.tensorflow.org/api_docs/python/tf/keras/backend/set_floatx.
upvoted 1 times
M25
2 years, 6 months ago
But then, “On CPUs, mixed precision will run significantly slower, however.”: https://www.tensorflow.org/guide/mixed_precision#supported_hardware. And, “The policy will run on other GPUs and CPUs but may not improve performance.”: https://www.tensorflow.org/guide/mixed_precision#setting_the_dtype_policy.
upvoted 1 times
...
...
M25
2 years, 6 months ago
“This can take around 500ms to process a single Tweet (of at most 128 tokens) on a CPU-based machine. The processing time can be greatly reduced to 20ms by running the model on a GPU instance (…). An option to dynamically quantize a TensorFlow model wasn’t available, so we updated the script to convert the TensorFlow models into TFLite and created the options to apply int8 or fp16 quantization.”: https://blog.twitter.com/engineering/en_us/topics/insights/2021/speeding-up-transformer-cpu-inference-in-google-cloud
upvoted 1 times
...
...
TNT87
2 years, 8 months ago
Selected Answer: A
A makes sense too
upvoted 4 times
TNT87
2 years, 8 months ago
But answer is B , i dnt know how i clicked A
upvoted 2 times
...
TNT87
2 years, 8 months ago
GPU serving can significantly speed up the serving of models due to the parallel processing power of GPUs. By switching from CPU to GPU serving, you can quickly lower the serving latency without making changes to the model architecture or precision. Once you have switched to GPU serving, you can evaluate the impact on the model's prediction quality and consider further optimization techniques if necessary. Therefore, the correct option is A.
upvoted 2 times
...
...
TNT87
2 years, 9 months ago
Answer B Applying quantization to your SavedModel by reducing the floating point precision can help reduce the serving latency by decreasing the amount of memory and computation required to make a prediction. TensorFlow provides tools such as the tf.quantization module that can be used to quantize models and reduce their precision, which can significantly reduce serving latency without a significant decrease in model performance
upvoted 1 times
...
imamapri
2 years, 9 months ago
Selected Answer: B
Vote B. https://www.tensorflow.org/lite/performance/post_training_float16_quant
upvoted 4 times
...

Topic 1 Question 146

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 146 discussion

You work on the data science team at a manufacturing company. You are reviewing the company’s historical sales data, which has hundreds of millions of records. For your exploratory data analysis, you need to calculate descriptive statistics such as mean, median, and mode; conduct complex statistical tests for hypothesis testing; and plot variations of the features over time. You want to use as much of the sales data as possible in your analyses while minimizing computational resources. What should you do?

  • A. Visualize the time plots in Google Data Studio. Import the dataset into Vertex Al Workbench user-managed notebooks. Use this data to calculate the descriptive statistics and run the statistical analyses.
  • B. Spin up a Vertex Al Workbench user-managed notebooks instance and import the dataset. Use this data to create statistical and visual analyses.
  • C. Use BigQuery to calculate the descriptive statistics. Use Vertex Al Workbench user-managed notebooks to visualize the time plots and run the statistical analyses.
  • D. Use BigQuery to calculate the descriptive statistics, and use Google Data Studio to visualize the time plots. Use Vertex Al Workbench user-managed notebooks to run the statistical analyses.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
andresvelasco
Highly Voted 1 year, 8 months ago
Selected Answer: D
I would go with D, thinking that Bigquery + Datastudio avoid having to load 100s of MILLIONS of records in memory for the most basic tasks, as required by the Notebook.
upvoted 6 times
...
gscharly
Most Recent 1 year ago
Selected Answer: C
went with c
upvoted 2 times
...
Aastha_Vashist
1 year, 1 month ago
Selected Answer: C
went with c
upvoted 2 times
...
pico
1 year, 6 months ago
Selected Answer: C
Option D is not as efficient because using Google Data Studio for time plots may not be as well-suited for handling large datasets, and it's more focused on data visualization. Option A involves importing data into Vertex AI Workbench first, which may not be the most efficient way to leverage BigQuery for handling large-scale data computations.
upvoted 3 times
...
friedi
1 year, 10 months ago
Selected Answer: D
D minimizes resources the most, since it minimizes the usage of Vertex AI notebooks, which basically require provisioning a VM in the background for the entire duration of development.
upvoted 3 times
...
bechir141bf
1 year, 11 months ago
Why not D ?
upvoted 4 times
tavva_prudhvi
1 year, 10 months ago
Using BigQuery to calculate the descriptive statistics is a good choice, but using Google Data Studio for visualizations may not be as flexible as using Vertex AI Workbench user-managed notebooks. Google Data Studio is a great tool for creating dashboards and reports, but it may not allow for the level of customization that is required for detailed exploratory data analysis.
upvoted 2 times
...
...
M25
2 years ago
Selected Answer: C
https://cloud.google.com/architecture/data-science-with-r-on-gcp-eda#ai_platform_notebooks https://cloud.google.com/vertex-ai-workbench#section-5
upvoted 1 times
...
TNT87
2 years, 2 months ago
Selected Answer: C
C. Use BigQuery to calculate the descriptive statistics. Use Vertex AI Workbench user-managed notebooks to visualize the time plots and run the statistical analyses. BigQuery is a powerful data analysis tool that can handle massive datasets, making it an ideal solution for calculating descriptive statistics for hundreds of millions of records. It can also perform complex statistical tests for hypothesis testing. For time series analysis, using Vertex AI Workbench user-managed notebooks would be the best solution as it provides a flexible environment for data exploration, visualization, and statistical analysis. By using the two tools together, the data science team can efficiently analyze the sales data while minimizing computational resources. Its C not B
upvoted 3 times
Gudwin
2 years ago
What is the point of actually giving chatGPT answers, some of which are incorrect?
upvoted 7 times
...
...
TNT87
2 years, 2 months ago
Answer B https://cloud.google.com/vertex-ai-workbench
upvoted 2 times
TNT87
2 years, 2 months ago
Answer C not B
upvoted 1 times
...
...
imamapri
2 years, 3 months ago
Selected Answer: B
Vote B. You can do all of the task in vertex AI workbench while minimizing computational resources.
upvoted 4 times
tavva_prudhvi
1 year, 10 months ago
Option B is also a viable solution, but it has some drawbacks compared to option C. While it is true that you can spin up a Vertex AI Workbench user-managed notebooks instance and import the dataset, this option requires more computational resources and may not be as cost-effective as using BigQuery to calculate the descriptive statistics. Additionally, while you can create statistical and visual analyses within the Vertex AI Workbench user-managed notebooks, it may not be as easy to create custom visualizations as it is with Google Data Studio. Therefore, while option B is a valid solution, option C is likely to be more efficient and cost-effective, as it takes advantage of the strengths of each tool.
upvoted 1 times
...
...

Topic 1 Question 147

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 147 discussion

Your data science team needs to rapidly experiment with various features, model architectures, and hyperparameters. They need to track the accuracy metrics for various experiments and use an API to query the metrics over time. What should they use to track and report their experiments while minimizing manual effort?

  • A. Use Vertex Al Pipelines to execute the experiments. Query the results stored in MetadataStore using the Vertex Al API.
  • B. Use Vertex Al Training to execute the experiments. Write the accuracy metrics to BigQuery, and query the results using the BigQuery API.
  • C. Use Vertex Al Training to execute the experiments. Write the accuracy metrics to Cloud Monitoring, and query the results using the Monitoring API.
  • D. Use Vertex Al Workbench user-managed notebooks to execute the experiments. Collect the results in a shared Google Sheets file, and query the results using the Google Sheets API.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
fitri001
1 year ago
Selected Answer: A
Vertex AI Pipelines: Pipelines are designed for automating experiment execution. You can define different steps like data preprocessing, training with various configurations, and evaluation. This allows rapid experimentation with minimal manual intervention. Vertex ML Metadata: Vertex AI Pipelines integrate seamlessly with Vertex ML Metadata, which automatically tracks experiment runs, metrics, and artifacts. This eliminates the need for manual data collection in spreadsheets. Vertex AI API: The Vertex AI API allows you to programmatically query the Vertex ML Metadata store. You can retrieve experiment details, including accuracy metrics, for further analysis or visualization.
upvoted 4 times
fitri001
1 year ago
B. BigQuery and Monitoring are not designed for experiment tracking: BigQuery excels at large-scale data analysis, and Cloud Monitoring is primarily for monitoring system health. While you could store metrics, querying them for experiment comparisons would be cumbersome. C. Manual collection in Google Sheets: This approach is highly error-prone and inefficient for rapid experimentation. Version control and querying metrics across multiple experiments would be challenging. D. Vertex AI Training (standalone): While Vertex AI Training can run experiments, it lacks built-in experiment tracking and querying functionalities. You'd need to develop custom solutions for managing metrics.
upvoted 1 times
...
...
M25
2 years ago
Selected Answer: A
Went with A
upvoted 2 times
...
TNT87
2 years, 2 months ago
Selected Answer: A
Option A is the best approach to track and report experiments while minimizing manual effort. The Vertex AI Pipelines provide a powerful tool for automating machine learning workflows, including data preparation, training, and deployment. MetadataStore can be used to track the performance of different models by logging accuracy metrics and other important information. The Vertex AI API can then be used to query the metadata store and retrieve the results of different experiments.
upvoted 1 times
...
chidstar
2 years, 2 months ago
Selected Answer: A
Vertex AI Pipelines covers everything. "Vertex AI Pipelines helps you to automate, monitor, and govern your ML systems by orchestrating your ML workflow in a serverless manner, and storing your workflow's artifacts using Vertex ML Metadata. By storing the artifacts of your ML workflow in Vertex ML Metadata, you can analyze the lineage of your workflow's artifacts — for example, an ML model's lineage may include the training data, hyperparameters, and code that were used to create the model."
upvoted 1 times
...
TNT87
2 years, 2 months ago
https://cloud.google.com/vertex-ai/docs/ml-metadata/analyzing
upvoted 1 times
...
Scipione_
2 years, 2 months ago
Selected Answer: A
Your goal is to use API to query results while minimizing manual effort. The answer 'A' achieves the goal and requires less manual effort
upvoted 1 times
...
RaghavAI
2 years, 3 months ago
Selected Answer: A
its A - Use Vertex Al Pipelines to execute the experiments. Query the results stored in MetadataStore using the Vertex Al API.
upvoted 1 times
...

Topic 1 Question 148

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 148 discussion

You are training an ML model using data stored in BigQuery that contains several values that are considered Personally Identifiable Information (PII). You need to reduce the sensitivity of the dataset before training your model. Every column is critical to your model. How should you proceed?

  • A. Using Dataflow, ingest the columns with sensitive data from BigQuery, and then randomize the values in each sensitive column.
  • B. Use the Cloud Data Loss Prevention (DLP) API to scan for sensitive data, and use Dataflow with the DLP API to encrypt sensitive values with Format Preserving Encryption.
  • C. Use the Cloud Data Loss Prevention (DLP) API to scan for sensitive data, and use Dataflow to replace all sensitive data by using the encryption algorithm AES-256 with a salt.
  • D. Before training, use BigQuery to select only the columns that do not contain sensitive data. Create an authorized view of the data so that sensitive values cannot be accessed by unauthorized individuals.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
AnnaR
Highly Voted 1 year ago
Selected Answer: B
Not A: Randomizing values alters the data in a way that it can significantly degrade the utility of the data for machine learning purposes (does not preserve original distributions, ...) Not C: Encryption with AES-256 secures the data, but does not preserve the format and would make the data unusable for ML models Not D: This ignores columns with sensitive data, which is not viable here as every column is critical to the model. + Creating an authorized view does not alter the data itself but restricts access, which does not address the need to reduce data sensitivity for model training
upvoted 5 times
...
M25
Most Recent 2 years ago
Selected Answer: B
https://cloud.google.com/dlp/docs/transformations-reference#types_of_de-identification_techniques https://cloud.google.com/dlp/docs/transformations-reference#crypto
upvoted 1 times
...
TNT87
2 years, 2 months ago
Selected Answer: B
Answer B
upvoted 1 times
TNT87
2 years, 2 months ago
model. The Cloud Data Loss Prevention (DLP) API can scan for sensitive data in the dataset and can help to encrypt the sensitive data using Format Preserving Encryption. This approach will allow for the preservation of the data distribution and format, enabling the model to maintain its accuracy. Additionally, using Dataflow with the DLP API can help to efficiently process the data at scale.
upvoted 2 times
...
...
Scipione_
2 years, 2 months ago
Selected Answer: B
Format Preserving Encryption uses deidentify configuration in which you can specify the param wrapped_key (the encrypted ('wrapped') AES-256 key to use). Answer is B according to me. Ref: https://cloud.google.com/dlp/docs/samples/dlp-deidentify-fpe
upvoted 3 times
...
TNT87
2 years, 2 months ago
Selected Answer: D
This approach would allow you to keep the critical columns of data while reducing the sensitivity of the dataset by removing the personally identifiable information (PII) before training the model. By creating an authorized view of the data, you can ensure that sensitive values cannot be accessed by unauthorized individuals. https://cloud.google.com/bigquery/docs/data-governance#data_loss_prevention
upvoted 2 times
TNT87
2 years, 2 months ago
Actually its B
upvoted 1 times
...
alelamb
2 years, 2 months ago
It says "every" column is critical to your model, why would select specific columns?
upvoted 2 times
TNT87
2 years, 2 months ago
Hence i provided a link, that should answer your flimsy question. it says "BigQuery that contains several values that are considered Personally Identifiable Information (PII)" i dnt know where you are getting it wrong. the "every" means you cant leave out sensitive data to train your model because every column is critical. its not difficult its easy bro....
upvoted 1 times
...
...
...
RaghavAI
2 years, 3 months ago
Selected Answer: B
https://cloud.google.com/dlp/docs/samples/dlp-deidentify-fpe
upvoted 1 times
...
imamapri
2 years, 3 months ago
Selected Answer: C
Vote C. https://cloud.google.com/dlp/docs/samples/dlp-deidentify-fpe
upvoted 1 times
...

Topic 1 Question 149

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 149 discussion

You recently deployed an ML model. Three months after deployment, you notice that your model is underperforming on certain subgroups, thus potentially leading to biased results. You suspect that the inequitable performance is due to class imbalances in the training data, but you cannot collect more data. What should you do? (Choose two.)

  • A. Remove training examples of high-performing subgroups, and retrain the model.
  • B. Add an additional objective to penalize the model more for errors made on the minority class, and retrain the model
  • C. Remove the features that have the highest correlations with the majority class.
  • D. Upsample or reweight your existing training data, and retrain the model
  • E. Redeploy the model, and provide a label explaining the model's behavior to users.
Show Suggested Answer Hide Answer
Suggested Answer: BD 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
fitri001
Highly Voted 1 year ago
Selected Answer: BD
Penalizing Errors on Minority Class (B): This technique, also known as cost-sensitive learning, modifies the loss function during training. Assigning a higher penalty to misclassifications of the minority class steers the model to prioritize learning from those examples. Upsampling/Reweighting Training Data (D): Upsampling increases the representation of the minority class in the training data by duplicating existing data points. Reweighting assigns higher weights to data points from the minority class during training, making their influence more significant.
upvoted 5 times
fitri001
1 year ago
A. Removing High-Performing Subgroup Examples: This removes valuable data and can worsen overall model performance. C. Removing High-Correlation Features: This might eliminate informative features and could negatively impact model accuracy. E. Redeploying with Explanation: While transparency is essential, it doesn't address the underlying performance disparity
upvoted 3 times
...
...
Carlose2108
Most Recent 1 year, 2 months ago
Selected Answer: BD
I went B & D.
upvoted 1 times
...
PST21
1 year, 9 months ago
D. Upsample or reweight your existing training data, and retrain the model. E. Redeploy the model, and provide a label explaining the model's behavior to users. Option D: Upsampling or reweighting your existing training data and retraining the model can help address the class imbalance issue and improve the performance on certain subgroups. By duplicating or adjusting the weights of samples from the minority class, the model will receive more exposure to these samples during training, leading to better learning and performance on the underrepresented subgroups. Option E: Redeploying the model and providing a label explaining the model's behavior to users is essential for transparency and accountability. If the model exhibits biased behavior or inequitable performance on certain subgroups, informing users about this issue can help them interpret the model's predictions more effectively and make informed decisions based on the model's output.
upvoted 1 times
...
M25
2 years ago
Selected Answer: BD
Went with B, D
upvoted 2 times
...
hakook
2 years, 2 months ago
Selected Answer: BD
should be B,D
upvoted 2 times
...
TNT87
2 years, 2 months ago
Selected Answer: BD
Option B and D could be good approaches to address the issue. B. Adding an additional objective to penalize the model more for errors made on the minority class can help the model to focus more on correctly classifying the underrepresented class. D. Upsampling or reweighting the existing training data can help balance the class distribution and increase the model's sensitivity to the underrepresented class.
upvoted 4 times
...
TNT87
2 years, 2 months ago
https://machinelearningmastery.com/tactics-to-combat-imbalanced-classes-in-your-machine-learning-dataset/
upvoted 2 times
TNT87
2 years, 2 months ago
https://www.analyticsvidhya.com/blog/2020/07/10-techniques-to-deal-with-class-imbalance-in-machine-learning/
upvoted 2 times
...
...
John_Pongthorn
2 years, 2 months ago
Selected Answer: BD
https://machinelearningmastery.com/tactics-to-combat-imbalanced-classes-in-your-machine-learning-dataset/
upvoted 1 times
...

Topic 1 Question 150

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 150 discussion

You are working on a binary classification ML algorithm that detects whether an image of a classified scanned document contains a company’s logo. In the dataset, 96% of examples don’t have the logo, so the dataset is very skewed. Which metric would give you the most confidence in your model?

  • A. Precision
  • B. Recall
  • C. RMSE
  • D. F1 score
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
PST21
Highly Voted 1 year, 9 months ago
B. Recall In a highly imbalanced dataset like the one described (96% of examples are in the negative class), the metric that would give the most confidence in the model's performance is recall. Recall (also known as sensitivity or true positive rate) is the proportion of actual positive cases that were correctly identified by the model. In this context, it means the percentage of images containing the company's logo that the model correctly classified as positive out of all the actual positive cases. Since the dataset is heavily skewed, a high recall value would indicate that the model is effectively capturing the positive cases (images with the logo) despite the class imbalance. F1 score (D) is a balance between precision and recall and is a useful metric for imbalanced datasets. However, in this specific case, recall is more important because we want to be confident in detecting the logo images, even if it comes at the cost of having some false positives (lower precision).
upvoted 9 times
vale_76_na_xxx
1 year, 5 months ago
I go for B as well
upvoted 1 times
...
...
fitri001
Highly Voted 1 year ago
Selected Answer: D
Precision vs. Recall: Precision focuses on the percentage of predicted positive cases (logo present) that are actually correct. Recall emphasizes the model's ability to identify all actual positive cases (correctly identifying all logos). In a highly imbalanced dataset, a naive model could simply predict "no logo" for every image and achieve very high accuracy (almost 96%!). However, this wouldn't be a useful model since it would miss all the actual logos (low recall). F1 Score: The F1 score strikes a balance between precision and recall. It takes the harmonic mean of these two metrics, giving a more comprehensive picture of the model's performance in both identifying logos (recall) and avoiding false positives (precision).
upvoted 9 times
AzureDP900
11 months, 3 weeks ago
very well explained!
upvoted 1 times
...
...
qaz09
Most Recent 4 months, 1 week ago
Selected Answer: B
This is imbalanced dataset: Positive class - 4% of examples Negative class - 96% of examples We want to lower FN, hence we should use recall. Well explained in: https://developers.google.com/machine-learning/crash-course/classification/accuracy-precision-recall
upvoted 2 times
...
8619d79
9 months, 1 week ago
Selected Answer: B
The focus here is on detecting images with logo (that is the minority class). Recall is the right metric. If the question would have highlighted that also the detection of images without logo is important I would have voted for D. But here is not the case. And of course F1 avoids that the model always mark as "with logo", but this is more on how the metric is used and interpreted. Otherwise when should be Recall useful?
upvoted 1 times
...
gscharly
1 year ago
Selected Answer: D
Went with D
upvoted 1 times
...
Yan_X
1 year, 1 month ago
Selected Answer: B
B See #90, should be F score with Recall weights more than Precision.
upvoted 4 times
...
CHARLIE2108
1 year, 1 month ago
Selected Answer: B
I went with B.
upvoted 1 times
...
vaibavi
1 year, 3 months ago
Selected Answer: D
F1 score provides a comprehensive evaluation by penalizing models that excel in just one aspect at the expense of the other. By considering both precision and recall, it helps identify models that effectively balance true positive identification with minimal false positives, making it a more suitable metric for imbalanced data like your logo detection problem.
upvoted 2 times
...
M25
2 years ago
Selected Answer: D
See #90!
upvoted 2 times
...
FherRO
2 years, 2 months ago
Selected Answer: D
F1 score works well for imbalanced data sets
upvoted 1 times
...
TNT87
2 years, 2 months ago
Selected Answer: D
https://stephenallwright.com/imbalanced-data-metric/
upvoted 2 times
...

Topic 1 Question 151

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 151 discussion

While running a model training pipeline on Vertex Al, you discover that the evaluation step is failing because of an out-of-memory error. You are currently using TensorFlow Model Analysis (TFMA) with a standard Evaluator TensorFlow Extended (TFX) pipeline component for the evaluation step. You want to stabilize the pipeline without downgrading the evaluation quality while minimizing infrastructure overhead. What should you do?

  • A. Include the flag -runner=DataflowRunner in beam_pipeline_args to run the evaluation step on Dataflow.
  • B. Move the evaluation step out of your pipeline and run it on custom Compute Engine VMs with sufficient memory.
  • C. Migrate your pipeline to Kubeflow hosted on Google Kubernetes Engine, and specify the appropriate node parameters for the evaluation step.
  • D. Add tfma.MetricsSpec () to limit the number of metrics in the evaluation step.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
MultipleWorkerMirroredStrategy
Highly Voted 2 years ago
Selected Answer: A
"Evaluator leverages the TensorFlow Model Analysis library to perform the analysis, which in turn use Apache Beam for scalable processing." Since Dataflow is Google Cloud's serverless Apache Beam offering, this option can easily be implemented to address the issue while leaving the evaluation logic as such identical https://www.tensorflow.org/tfx/guide/evaluator#evaluator_and_tensorflow_model_analysis
upvoted 7 times
pico
1 year, 11 months ago
If we have to add dataflow then this condition is not met: minimizing infrastructure overhead
upvoted 1 times
Zepopo
1 year, 7 months ago
No, it is. If we choose another option, there would be: B - you need to configure VMs and migrate all workloads C - also overhead with migrating D - downgrading the evaluation quality So just switch runner seems a very easy option
upvoted 3 times
...
f084277
12 months ago
Dataflow requires no infrastructure management.
upvoted 1 times
...
...
...
M25
Highly Voted 2 years, 6 months ago
Selected Answer: A
Links already provided below: “That works fine for one hundred records, but what if the goal was to process all 187,002,0025 rows in the dataset? For this, the pipeline is switched from the DirectRunner to the production Dataflow runner.” [Option A] https://blog.tensorflow.org/2020/03/tensorflow-extended-tfx-using-apache-beam-large-scale-data-processing.html. "Metrics to configure (only required if additional metrics are being added outside of those saved with the model).” https://www.tensorflow.org/tfx/guide/evaluator#using_the_evaluator_component will thus add, not “limit the number of metrics in the evaluation step”. [Option D]
upvoted 6 times
...
NamitSehgal
Most Recent 8 months, 3 weeks ago
Selected Answer: A
TFMA Integration with Dataflow
upvoted 1 times
...
gscharly
1 year, 6 months ago
Selected Answer: A
with D we're downgrading evaluation. Dataflow is serverless so no infrastructure overhead is included
upvoted 2 times
...
pico
1 year, 11 months ago
Selected Answer: D
Limiting Metrics: TensorFlow Model Analysis (TFMA) allows you to define a subset of metrics that you are interested in during the evaluation step. By using tfma.MetricsSpec(), you can specify a subset of metrics to be computed during the evaluation, which can help reduce the memory requirements. Out-of-Memory Error: Out-of-memory errors during model evaluation often occur when the system is trying to compute and store a large number of metrics, especially if the model or dataset is large. By limiting the number of metrics using tfma.MetricsSpec(), you can potentially reduce the memory footprint and resolve the out-of-memory error.
upvoted 2 times
...
PST21
2 years, 3 months ago
Based on the question's context, the correct option to stabilize the pipeline without downgrading the evaluation quality while minimizing infrastructure overhead is: D. Add tfma.MetricsSpec() to limit the number of metrics in the evaluation step. The question specifies that the evaluation step is failing due to an out-of-memory error. In such a scenario, limiting the number of metrics to be computed during evaluation using tfma.MetricsSpec() can help reduce memory requirements and potentially resolve the out-of-memory issue.
upvoted 1 times
...
tavva_prudhvi
2 years, 4 months ago
Selected Answer: D
By adding tfma.MetricsSpec(), you can limit the number of metrics that are computed during the evaluation step, thus reducing the memory requirement. This will help stabilize the pipeline without downgrading the evaluation quality, while minimizing infrastructure overhead. This option is a quick and easy solution that can be implemented without significant changes to the pipeline or infrastructure. Option A: Including the flag -runner=DataflowRunner in beam_pipeline_args to run the evaluation step on Dataflow may help to increase memory availability, but it may also increase infrastructure overhead.
upvoted 1 times
tavva_prudhvi
2 years, 3 months ago
it seems, Option D might reduce memory usage, it could potentially compromise the evaluation quality by not considering all the necessary metrics. Confused in A/D!
upvoted 1 times
...
...
Gudwin
2 years, 6 months ago
Selected Answer: D
D does not harm the evaluation quality.
upvoted 1 times
...
frangm23
2 years, 6 months ago
I'm not very sure, but wouldn't be A?.D is degrading evaluation quality (if you're getting less metrics, then the evaluation is worse, at least less complete)
upvoted 2 times
...
Yajnas_arpohc
2 years, 7 months ago
Selected Answer: A
TFX 0.30 and above adds an interface, with_beam_pipeline_args, for extending the pipeline level beam args per component tfma.MetricSpec() OOB has recommended metrics; reducing any further might not serve the purpose.
upvoted 2 times
...
TNT87
2 years, 8 months ago
Selected Answer: D
Add tfma.MetricsSpec () to limit the number of metrics in the evaluation step. Limiting the number of metrics in the evaluation step using tfma.MetricsSpec() can reduce the memory usage during evaluation and address the out-of-memory error. This can help stabilize the pipeline without downgrading the evaluation quality or incurring additional infrastructure overhead. Running the evaluation step on Dataflow or custom Compute Engine VMs can be resource-intensive and expensive, while migrating the pipeline to Kubeflow would require additional setup and configuration. ANSWER D
upvoted 4 times
...
Ml06
2 years, 8 months ago
A is wrong , it does not even make sense , the default runner for evaluator component of TFX is data flow so setting runner to dataflow does not change anything , the answer is D because it does not include the any infrastructure minpulation and reduce the memory useable of the TfX component
upvoted 2 times
TNT87
2 years, 8 months ago
https://www.tensorflow.org/tfx/guide/evaluator
upvoted 1 times
...
f084277
12 months ago
It uses the beam local runner by default, not Dataflow. You are wrong.
upvoted 1 times
...
...
TNT87
2 years, 8 months ago
Selected Answer: A
Answer A
upvoted 2 times
TNT87
2 years, 8 months ago
Answer D
upvoted 1 times
...
...
RaghavAI
2 years, 9 months ago
Selected Answer: A
https://blog.tensorflow.org/2020/03/tensorflow-extended-tfx-using-apache-beam-large-scale-data-processing.html
upvoted 4 times
...

Topic 1 Question 152

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 152 discussion

You are developing an ML model using a dataset with categorical input variables. You have randomly split half of the data into training and test sets. After applying one-hot encoding on the categorical variables in the training set, you discover that one categorical variable is missing from the test set. What should you do?

  • A. Use sparse representation in the test set.
  • B. Randomly redistribute the data, with 70% for the training set and 30% for the test set
  • C. Apply one-hot encoding on the categorical variables in the test data
  • D. Collect more data representing all categories
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Fer660
2 months, 2 weeks ago
Selected Answer: D
None of the alternatives are correct. This question should be flagged for review. Not A: If the model is trained on one-hot, how will it work with sparse. You could make some workaround, but this is brittle. Not B: Likely to make the situation even worse Not C: Then the model will do what exactly when it finds the extra category in the training set? Not D: we may or may not be in a position to go for extra data
upvoted 1 times
...
NamitSehgal
8 months, 3 weeks ago
Selected Answer: C
The crucial point is that the same encoding scheme must be applied to both the training and test sets.
upvoted 1 times
...
baimus
1 year, 2 months ago
Selected Answer: C
I've very grudgingly ticket C, as the question is missing "handle the missing category by one hot encoding all zeros for the missing feature column". It otherwise doesn't make sense as will have the wrong amount of entries.
upvoted 1 times
Futtsie
8 months, 1 week ago
Yes I agree! Wish they worded it better.
upvoted 1 times
...
...
fitri001
1 year, 6 months ago
Selected Answer: C
The correct approach is to handle the missing category during one-hot encoding of the test data. Here's how to address this issue: Identify the Missing Category: After applying one-hot encoding to the training set, compare the categories (unique values) present in the training data with the categories in the test data. This will reveal the missing category. Add a Column for the Missing Category in the Test Data: Include a new column in the test data specifically for the missing category. Initialize the values in this column with 0. Apply One-Hot Encoding to the Test Data: Now that the test data includes a column for the missing category, proceed with one-hot encoding the categorical variables in the test data. This will ensure the test data has the same structure as the encoded training data.
upvoted 2 times
baimus
1 year, 2 months ago
But your description includes a missing critical step that the question is missing to make it make sense.
upvoted 1 times
...
...
CHARLIE2108
1 year, 9 months ago
Selected Answer: C
Answer C
upvoted 1 times
...
Nxtgen
2 years, 2 months ago
Selected Answer: C
Answer options analysis: C. Since one categorical variable is missing from the test set, (As I understand: “a categorical variable is in the test but not in train”) apply one hot encoding (trained with the train set?) to the test set, for the variables not present in train we just would obtain an array of all 0’s, so that would be OK. D. That data collection could be not feasible depending on the real-world-problem. B. Randomness would not always fix the problem. A. Not recommended to use different representations for train/test. Sparse representation doesn't magically recover missing categories; it's a way to efficiently store data with a large number of zeros. I would go with C.
upvoted 4 times
...
SamuelTsch
2 years, 4 months ago
Selected Answer: C
C but not really sure
upvoted 1 times
...
Scipione_
2 years, 5 months ago
Selected Answer: C
You must apply one hot enconding alsto for the test dataset. However, i find this answer incomplete.
upvoted 3 times
baimus
1 year, 2 months ago
Yeah 100% - it's missing the "but make sure it deals with the missing category by adding a "missing" or something to it so the one hot representation has the right number of items.
upvoted 1 times
...
...
nescafe7
2 years, 5 months ago
Selected Answer: D
Add data to the test set to get the same OHE
upvoted 3 times
tavva_prudhvi
2 years, 4 months ago
Option D (collecting more data) may not be feasible or necessary if the missing category is not significant or if one-hot encoding is sufficient to handle it.
upvoted 2 times
...
...
M25
2 years, 6 months ago
Selected Answer: B
“Rows are selected for a data split randomly, but deterministically. (…) Training a new model with the same training data results in the same data split.” https://cloud.google.com/vertex-ai/docs/tabular-data/data-splits#classification-random. “Randomly redistribute data” [Option B] with different fractions, will result in a different data split. Having a higher fraction split of 70% for the training set will additionally help the model to better generalize (compared to only 50%), thus perform better when testing, the ultimate goal.
upvoted 2 times
maukaba
2 years ago
https://cloud.google.com/vertex-ai/docs/tabular-data/data-splits#classification-random I think it's applicable to VertexAI auto ML only
upvoted 1 times
...
M25
2 years, 6 months ago
Sparse representation is one “in which only nonzero values are stored”, excluding [Option A]: https://developers.google.com/machine-learning/crash-course/representation/feature-engineering#sparse-representation. Applying “one-hot encoding” to the columns will not help finding the missing column, thus excluding [Option C]. No indication provided for a need to “collect more data”, excluding [Option D].
upvoted 1 times
...
julliet
2 years, 5 months ago
it is possible that category is very rare and that is the reason we don't have it in the test. So I guess we should just apply the train data transformations and use one-hot
upvoted 2 times
...
...
Gudwin
2 years, 6 months ago
Selected Answer: C
By using a sparse representation, you will be losing the information contained in the missing categorical variable. This could lead to the model making incorrect predictions on the test set.
upvoted 2 times
...
wrosengren
2 years, 6 months ago
I agree with formazioneQl that if a different one hot encoding is used for the test set compared to the train set then the results would be poor. However, there is no problem with not having all combinations in the test set if all possibilities are present in the training set. So assuming that we are using the same mapping of data in the train and test set, I would vote C. If we don't encode the test set, the variable is meaningless anyways. So I would lean C.
upvoted 1 times
...
formazioneQI
2 years, 6 months ago
Selected Answer: A
Since one categorical variable is missing from the test set, C would result in a different number of columns in the training and test sets.
upvoted 3 times
tavva_prudhvi
2 years, 4 months ago
Option A (sparse representation) may not work well in this case, as it can lead to sparsity issues and affect the model's performance.
upvoted 1 times
...
...
TNT87
2 years, 8 months ago
C. Apply one-hot encoding on the categorical variables in the test data. When using one-hot encoding on categorical variables, each unique value of the variable is represented as a separate binary variable. Therefore, it is important to ensure that the same set of binary variables is present in both the training and test datasets. Since one categorical variable is missing in the test set, the recommended approach is to apply one-hot encoding on the categorical variables in the test set to ensure that the same set of binary variables is present in both datasets.
upvoted 2 times
...
TNT87
2 years, 8 months ago
Selected Answer: C
Answer C
upvoted 1 times
...

Topic 1 Question 153

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 153 discussion

You work for a bank and are building a random forest model for fraud detection. You have a dataset that includes transactions, of which 1% are identified as fraudulent. Which data transformation strategy would likely improve the performance of your classifier?

  • A. Modify the target variable using the Box-Cox transformation.
  • B. Z-normalize all the numeric features.
  • C. Oversample the fraudulent transaction 10 times.
  • D. Log transform all numeric features.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Scipione_
Highly Voted 2 years, 2 months ago
Selected Answer: C
The answer is C beacause it's the only way to improve model performance. Box-Cox transformation: transform feature values according to normal distribution Z-normalization: transform feature values according to x_new = (x – μ) / σ (so {x_new} have mean 0 and std dev 1) Log transform: just log transformation Also, the Random Forest algorithm is not a distance-based model but it is a tree-based model, there's no need of normalization process.
upvoted 5 times
...
fitri001
Most Recent 1 year ago
Selected Answer: C
Oversampling is a common technique to address class imbalance and can significantly improve the performance of the random forest model in fraud detection. It's important to note that oversampling can lead to overfitting, so monitoring the model's performance on unseen data (validation set) is crucial. You might also consider exploring other techniques like undersampling the majority class or using SMOTE (Synthetic Minority Oversampling Technique) for a more balanced approach.
upvoted 3 times
fitri001
1 year ago
Class Imbalance: The dataset has a significant class imbalance, with only 1% of transactions being fraudulent (minority class). Random forest models can be biased towards the majority class during training. Oversampling: Oversampling replicates instances from the minority class (fraudulent transactions) in this case. By increasing the representation of the fraudulent class (10 times in this scenario), the model is exposed to more examples of fraud, improving its ability to learn and detect fraudulent patterns.
upvoted 1 times
...
...
pinimichele01
1 year, 1 month ago
Selected Answer: C
See #60!
upvoted 1 times
...
M25
2 years ago
Selected Answer: C
See #60! The End. Good luck everyone!!!
upvoted 2 times
...
TNT87
2 years, 3 months ago
Selected Answer: C
https://towardsdatascience.com/how-to-build-a-machine-learning-model-to-identify-credit-card-fraud-in-5-stepsa-hands-on-modeling-5140b3bd19f1
upvoted 1 times
...

Topic 1 Question 154

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 154 discussion

You are developing a classification model to support predictions for your company’s various products. The dataset you were given for model development has class imbalance You need to minimize false positives and false negatives What evaluation metric should you use to properly train the model?

  • A. F1 score
  • B. Recall
  • C. Accuracy
  • D. Precision
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Antmal
Highly Voted 2 years, 6 months ago
Selected Answer: A
if there wasn't a class imbalance that C. Accuracy would have been the right answer. There A. F1-score which is harmonic mean of precision and recall, that balances the trade-off between precision and recall. It is useful when both false positives and false negatives are important as per the question at hand, and you want to optimize for both.
upvoted 7 times
...
AzureDP900
Most Recent 1 year, 4 months ago
In this case, you want to minimize both false positives and false negatives. The F1 score takes into account both the number of true positives and true negatives, making it a suitable choice for evaluating your model.
upvoted 2 times
...
fitri001
1 year, 6 months ago
Selected Answer: A
Class Imbalance: When dealing with imbalanced data, metrics like accuracy can be misleading. A model that simply predicts the majority class all the time can achieve high accuracy, but it wouldn't be very useful for identifying the minority class (which is likely more important in this scenario). F1 Score: The F1 score is the harmonic mean of precision and recall. Precision measures the proportion of positive predictions that are actually correct, while recall measures the proportion of actual positive cases that are correctly identified. By considering both metrics, F1 score provides a balanced view of the model's performance in identifying both positive and negative cases. Minimizing False Positives and False Negatives: Since a high F1 score indicates a good balance between precision and recall, it translates to minimizing both false positives (incorrect positive predictions) and false negatives (missed positive cases).
upvoted 4 times
...
PST21
2 years, 3 months ago
Recall (True Positive Rate): It measures the ability of the model to correctly identify all positive instances out of the total actual positive instances. High recall means fewer false negatives, which is desired when minimizing the risk of missing important positive cases. F1 Score: It is the harmonic mean of precision and recall. F1 score gives equal weight to both precision and recall and is suitable when you want a balanced metric. However, it might not be the best choice when the primary focus is on minimizing false positives and false negatives.
upvoted 1 times
...
PST21
2 years, 3 months ago
both recall and F1 score are valuable metrics, but based on the question's specific requirement to minimize false positives and false negatives, recall (Option B) is the best answer. It directly focuses on reducing false negatives, which is crucial when dealing with class imbalance and minimizing the risk of missing important positive cases.
upvoted 1 times
...
SamuelTsch
2 years, 4 months ago
Selected Answer: A
F1 should be correct
upvoted 1 times
...
nescafe7
2 years, 5 months ago
class imbalance = F1 score
upvoted 1 times
...

Topic 1 Question 155

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 155 discussion

You are training an object detection machine learning model on a dataset that consists of three million X-ray images, each roughly 2 GB in size. You are using Vertex AI Training to run a custom training application on a Compute Engine instance with 32-cores, 128 GB of RAM, and 1 NVIDIA P100 GPU. You notice that model training is taking a very long time. You want to decrease training time without sacrificing model performance. What should you do?

  • A. Increase the instance memory to 512 GB, and increase the batch size.
  • B. Replace the NVIDIA P100 GPU with a K80 GPU in the training job.
  • C. Enable early stopping in your Vertex AI Training job.
  • D. Use the tf.distribute.Strategy API and run a distributed training job.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
fitri001
Highly Voted 1 year, 6 months ago
Selected Answer: D
Large Dataset: With millions of images, training on a single machine can be very slow. Distributed training allows you to split the training data and workload across multiple machines, significantly speeding up the process. Vertex AI Training and tf.distribute: Vertex AI Training supports TensorFlow, and the tf.distribute library provides tools for implementing distributed training strategies. By leveraging this functionality, you can efficiently distribute the training tasks across the available cores and GPU on your Compute Engine instance (32 cores and 1 NVIDIA P100 GPU).
upvoted 5 times
...
Fer660
Most Recent 2 months, 2 weeks ago
Selected Answer: D
I guess D is what they expect. But: the question does not say that we are training a TF model, so this is misleading.
upvoted 1 times
...
baimus
1 year, 2 months ago
Selected Answer: D
Some strategies, like tf.distribute.MirroredStrategy, can provide performance optimizations even on a single GPU. For example, it can take advantage of better gradient computation or data parallelism during backpropagation, which can slightly optimize performance.
upvoted 1 times
...
Prakzz
1 year, 4 months ago
Same Question as 96?
upvoted 1 times
...
pinimichele01
1 year, 6 months ago
Selected Answer: D
https://www.tensorflow.org/guide/distributed_training#onedevicestrategy
upvoted 1 times
...
guilhermebutzke
1 year, 9 months ago
Selected Answer: D
D. Use the tf.distribute.Strategy API and run a distributed training job. Here's why: A. Increase instance memory and batch size: This might not be helpful. While increasing memory could help with loading more images at once, the main bottleneck here is likely processing these large images. Increasing the batch size can worsen the problem by further straining the GPU's memory. B. Replace P100 with K80 GPU: A weaker GPU would likely slow down training instead of speeding it up. C. Enable early stopping: This can save time but might stop training before reaching optimal performance. D. Use tf.distribute.Strategy: This allows you to distribute the training workload across multiple GPUs or cores within your instance, significantly accelerating training without changing the model itself. This effectively leverages the available hardware efficiently.
upvoted 4 times
...
bcama
2 years, 2 months ago
Selected Answer: D
perhaps the fact that the second or more GPUs are created is implied and the answer is D https://codelabs.developers.google.com/vertex_multiworker_training#2
upvoted 1 times
...
ciro_li
2 years, 3 months ago
Selected Answer: D
https://www.tensorflow.org/guide/gpu ?
upvoted 1 times
ciro_li
2 years, 3 months ago
I was wrong. It's A.
upvoted 1 times
...
...
PST21
2 years, 3 months ago
Selected Answer: D
to decrease training time without sacrificing model performance, the best approach is to use the tf.distribute.Strategy API and run a distributed training job, leveraging the capabilities of the available GPU(s) for parallelized training.
upvoted 1 times
...
powerby35
2 years, 4 months ago
Selected Answer: A
A since we just have one gpu, we could not use tf.distribute.Strategy in D
upvoted 1 times
powerby35
2 years, 4 months ago
And C early stopping maybe hurt the performance
upvoted 1 times
...
TLampr
1 year, 11 months ago
The increased batch size also can hurt the performance if it is not followed by further optimizations with regards to learning rate for example. If early stopping is applied according to common convention, by stopping when the validation loss starts increasing, it should not hurt the performance. However it is not specified in the answer sadly.
upvoted 1 times
...
...

Topic 1 Question 156

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 156 discussion

You need to build classification workflows over several structured datasets currently stored in BigQuery. Because you will be performing the classification several times, you want to complete the following steps without writing code: exploratory data analysis, feature selection, model building, training, and hyperparameter tuning and serving. What should you do?

  • A. Train a TensorFlow model on Vertex AI.
  • B. Train a classification Vertex AutoML model.
  • C. Run a logistic regression job on BigQuery ML.
  • D. Use scikit-learn in Vertex AI Workbench user-managed notebooks with pandas library.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
fitri001
1 year ago
Selected Answer: B
Vertex AutoML Tables is a managed service specifically designed for building machine learning models from structured data in BigQuery, all without writing code. It automates various stages of the machine learning pipeline, including: Exploratory data analysis: AutoML Tables performs basic data understanding to identify potential issues. Feature selection: It can automatically select relevant features for model training. Model building: AutoML Tables trains and evaluates various machine learning models and chooses the best performing one for classification. Hyperparameter tuning: It automatically tunes hyperparameters to optimize model performance. Serving: You can deploy the trained model for making predictions on new data.
upvoted 3 times
...
ludovikush
1 year, 1 month ago
Selected Answer: B
B, since it's specifying without writing code
upvoted 2 times
...
Carlose2108
1 year, 2 months ago
Selected Answer: B
No writing Code. Option B.
upvoted 1 times
...
Tonygangrade
1 year, 2 months ago
Selected Answer: B
AutoML -> No writing code
upvoted 1 times
...
36bdc1e
1 year, 4 months ago
B With automl we don’t write any line of code
upvoted 1 times
...
bugger123
1 year, 5 months ago
B is correct. A and D imply writing code in TF and Sklearn respectively. C is writing BQML code for logistic regression as well. Furthermore, how would you do the EDA and feature selection etc. without writing code? AutoML is THE codeless solution automating all the steps mentioned above. https://cloud.google.com/automl?hl=en
upvoted 1 times
...
Nxtgen
1 year, 8 months ago
Selected Answer: B
A and D would require writing code. C. would also imply some “code writing” in BigQuery ML I would go with B.
upvoted 1 times
...
powerby35
1 year, 10 months ago
Selected Answer: B
B "without writing code"
upvoted 1 times
...

Topic 1 Question 157

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 157 discussion

You recently developed a deep learning model. To test your new model, you trained it for a few epochs on a large dataset. You observe that the training and validation losses barely changed during the training run. You want to quickly debug your model. What should you do first?

  • A. Verify that your model can obtain a low loss on a small subset of the dataset
  • B. Add handcrafted features to inject your domain knowledge into the model
  • C. Use the Vertex AI hyperparameter tuning service to identify a better learning rate
  • D. Use hardware accelerators and train your model for more epochs
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
fitri001
Highly Voted 1 year ago
Selected Answer: A
Isolating the Issue: Training on a small subset helps isolate the problem to the model itself rather than the entire training pipeline or large dataset. Efficiency: Debugging with a small dataset is faster, allowing you to iterate through potential solutions quicker. Identifying Fundamental Issues: If the model struggles to learn even on a small dataset, it indicates a more fundamental problem in the model architecture, data preprocessing, or learning algorithm.
upvoted 6 times
...
OpenKnowledge
Most Recent 3 weeks, 6 days ago
Selected Answer: A
A should be the 1st option to try
upvoted 1 times
...
tavva_prudhvi
1 year, 5 months ago
Selected Answer: A
Verifying that your model can obtain a low loss on a small subset of the dataset is a good first step for debugging because it helps you determine if your model is capable of fitting the data and learning from it. If your model cannot fit a small subset of the data, it may indicate issues with the model architecture, initialization, or optimization algorithm. By starting with a small subset, you can identify and fix these issues more quickly, before moving on to larger-scale training and more complex debugging tasks.
upvoted 3 times
...
Mdso
1 year, 9 months ago
Selected Answer: A
I choose A
upvoted 1 times
...
PST21
1 year, 9 months ago
Selected Answer: A
the first step to quickly debug the deep learning model is to verify that it can obtain a low loss on a small subset of the dataset (Option A). If the model fails to achieve good results on the smaller subset, further investigation is required to identify and address potential issues with the model.
upvoted 1 times
...

Topic 1 Question 158

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 158 discussion

You are a data scientist at an industrial equipment manufacturing company. You are developing a regression model to estimate the power consumption in the company’s manufacturing plants based on sensor data collected from all of the plants. The sensors collect tens of millions of records every day. You need to schedule daily training runs for your model that use all the data collected up to the current date. You want your model to scale smoothly and require minimal development work. What should you do?

  • A. Develop a custom TensorFlow regression model, and optimize it using Vertex AI Training.
  • B. Develop a regression model using BigQuery ML.
  • C. Develop a custom scikit-learn regression model, and optimize it using Vertex AI Training.
  • D. Develop a custom PyTorch regression model, and optimize it using Vertex AI Training.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
OpenKnowledge
3 weeks, 6 days ago
Selected Answer: B
BQMK is serverless and low-code ML solution
upvoted 1 times
...
VinaoSilva
1 year, 4 months ago
Selected Answer: B
minimal development work + regression model = BigQuery ML
upvoted 3 times
...
AzureDP900
1 year, 4 months ago
B. Develop a regression model using BigQuery ML. You're looking for a solution that scales smoothly and requires minimal development work. BigQuery ML is an excellent choice because it allows you to create machine learning models directly in BigQuery, without the need to write code or set up complex infrastructure.
upvoted 1 times
...
fitri001
1 year, 6 months ago
Selected Answer: B
Scalability: BigQuery is a serverless data warehouse designed to handle massive datasets. It can efficiently process tens of millions of records daily for model training. Minimal Development Work: BigQuery ML offers built-in regression models like linear regression that you can train directly on your data stored in BigQuery. This eliminates the need for extensive custom code development with TensorFlow, PyTorch, or scikit-learn (options A, C, and D). Daily Training Runs: BigQuery ML allows scheduling queries for automated model training. You can set up a daily scheduled query to train your model on the latest data.
upvoted 3 times
...
7cb0ab3
1 year, 7 months ago
Selected Answer: B
Minimal development effort can be achieved with BigQuery ML. Also the amount of data is already in BQ.
upvoted 3 times
...
pinimichele01
1 year, 7 months ago
Selected Answer: B
Minimal dev effort => BigQueryML
upvoted 1 times
...
Carlose2108
1 year, 8 months ago
Selected Answer: C
I went C.
upvoted 1 times
...
Mdso
2 years, 3 months ago
Selected Answer: B
Minimal development effort => BigQueryML
upvoted 3 times
...
PST21
2 years, 3 months ago
Selected Answer: B
for scheduling daily training runs with minimal development work and seamless scaling, the best option is to develop a regression model using BigQuery ML (Option B). It allows you to perform model training and inference directly within BigQuery, taking advantage of its distributed processing capabilities to handle large datasets effortlessly.
upvoted 1 times
...

Topic 1 Question 159

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 159 discussion

Your organization manages an online message board. A few months ago, you discovered an increase in toxic language and bullying on the message board. You deployed an automated text classifier that flags certain comments as toxic or harmful. Now some users are reporting that benign comments referencing their religion are being misclassified as abusive. Upon further inspection, you find that your classifier's false positive rate is higher for comments that reference certain underrepresented religious groups. Your team has a limited budget and is already overextended. What should you do?

  • A. Add synthetic training data where those phrases are used in non-toxic ways.
  • B. Remove the model and replace it with human moderation.
  • C. Replace your model with a different text classifier.
  • D. Raise the threshold for comments to be considered toxic or harmful.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
f084277
Highly Voted 12 months ago
Selected Answer: D
The answer is D. Of course A would be ideal, but it totally ignores the constraints presented in the question that your team is already overextended.
upvoted 5 times
...
NamitSehgal
Most Recent 8 months, 3 weeks ago
Selected Answer: A
By adding synthetic training data that includes benign references to the underrepresented religious groups, you can help the model better understand the context in which these phrases are used and reduce the false positive rate. T
upvoted 3 times
...
vini123
9 months, 1 week ago
Selected Answer: A
Option A actively teaches the model to differentiate toxic from non-toxic uses of religious references, reducing bias while maintaining strong moderation.
upvoted 3 times
...
Omi_04040
11 months ago
Selected Answer: A
Main issue is to address the bias in the model
upvoted 1 times
...
Laur_C
11 months ago
Selected Answer: A
I chose A - least expensive/time consuming way to actually solve the problem. D sacrifices model quality and does not change the inherent bias of the model, meaning that the biases would still remain if this solution was chosen. Does not seem like the ethical/best practices solution
upvoted 2 times
...
rajshiv
11 months, 1 week ago
Selected Answer: A
D is not a good answer. Raising the threshold would reduce the number of toxic comments flagged (perhaps lowering false positives), but it would also increase the number of actual toxic comments being missed (higher false negatives). This exacerbates the problem and do not address the bias in the model. I think A is the best answer.
upvoted 3 times
...
Dirtie_Sinkie
1 year, 1 month ago
Selected Answer: A
Gonna go with A on this one. Some toxic comments will still make it through if you choose D, whereas A addresses the problem fully and directly. Therefore I think A is a more complete answer than D.
upvoted 1 times
Dirtie_Sinkie
1 year, 1 month ago
Even though in the question it says "Your team has a limited budget and is already overextended" I still think A is the better answer because it doesn't take much effort to create synthetic data and add it to train. The outcome will be more accurate than D.
upvoted 4 times
f084277
12 months ago
Of course A is "better", but it ignores the constraints of the question and is therefore wrong.
upvoted 1 times
...
Fer660
2 months, 2 weeks ago
Makes sense. Adding the synthetic data is a quick job. Raising the threshold might have broad negative impact on the classifier -- fix one issue but introduce many more.
upvoted 2 times
...
...
...
baimus
1 year, 2 months ago
Selected Answer: A
A is better than D, because D means that more geniunely toxic comments will make it through. A will teach the model to acknowledge the small subset of mislabelled comments, without exposing the customers to additional toxicity.
upvoted 2 times
...
AzureDP900
1 year, 4 months ago
option A (Add synthetic training data where those phrases are used in non-toxic ways) directly addresses the specific issue of bias and improves the model's accuracy by providing more contextually relevant training examples. This approach is more targeted and has a lower risk of introducing new biases or negatively impacting other aspects of comment moderation. I hope this additional explanation helps clarify why option D might not be the best choice in this scenario!
upvoted 2 times
AzureDP900
1 year, 4 months ago
Raising the threshold would mean increasing the minimum score required for a comment to be classified as toxic or harmful. This could potentially reduce the number of false positives (benign comments being misclassified as toxic) by making it harder for the model to classify a comment as toxic.
upvoted 1 times
...
...
Simple_shreedhar
1 year, 5 months ago
A option directly addresses the bias issue without incurring significant ongoing costs or burdening the moderation team. By augmenting the training dataset with synthetic examples where phrases related to underrepresented religious groups are used in non-toxic ways, the classifier can learn to distinguish between toxic and benign comments more accurately.
upvoted 2 times
...
gscharly
1 year, 6 months ago
Selected Answer: D
agree with daidai75
upvoted 1 times
...
pinimichele01
1 year, 7 months ago
Selected Answer: D
Your team has a limited budget and is already overextended
upvoted 2 times
...
7cb0ab3
1 year, 7 months ago
Selected Answer: A
I went fo A because it directly tackels the issue of misclassification and improving the models unterstanding of religious references. B and C don't make sense. D would generally reduce the number of comments flagged as toxic, which could decrease the false positive rate. However, this approach risks allowing genuinely harmful comments to go unflagged. It addresses the symptom (high false positive rate) rather than the underlying cause
upvoted 2 times
...
edoo
1 year, 8 months ago
Selected Answer: A
B and C are non sense, I don't want to risk potentially increasing the FNR by reducing the FPR (Raise the threshold). Thus A.
upvoted 1 times
...
daidai75
1 year, 9 months ago
Selected Answer: D
Your team has a limited budget and is already overextended, that means the re-training is hardly possible.
upvoted 2 times
...
tavva_prudhvi
2 years, 3 months ago
In the long run, usually we go with A, but Option D could be a temporary solution to reduce false positives, while being aware that it may allow some genuinely toxic comments to go unnoticed. However, this may be a necessary trade-off until your team has the resources to improve the classifier or find a better solution.
upvoted 1 times
...
powerby35
2 years, 3 months ago
Selected Answer: D
"Your team has a limited budget and is already overextended"
upvoted 2 times
...

Topic 1 Question 160

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 160 discussion

You work for a magazine distributor and need to build a model that predicts which customers will renew their subscriptions for the upcoming year. Using your company’s historical data as your training set, you created a TensorFlow model and deployed it to Vertex AI. You need to determine which customer attribute has the most predictive power for each prediction served by the model. What should you do?

  • A. Stream prediction results to BigQuery. Use BigQuery’s CORR(X1, X2) function to calculate the Pearson correlation coefficient between each feature and the target variable.
  • B. Use Vertex Explainable AI. Submit each prediction request with the explain' keyword to retrieve feature attributions using the sampled Shapley method.
  • C. Use Vertex AI Workbench user-managed notebooks to perform a Lasso regression analysis on your model, which will eliminate features that do not provide a strong signal.
  • D. Use the What-If tool in Google Cloud to determine how your model will perform when individual features are excluded. Rank the feature importance in order of those that caused the most significant performance drop when removed from the model.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
fitri001
Highly Voted 1 year ago
Selected Answer: B
Feature Importance per Prediction: Vertex Explainable AI with the Shapley method provides feature attributions for each individual prediction. This allows you to understand which attributes were most influential in the model's decision for that specific customer. No Code Required: This approach leverages a built-in Vertex AI service and doesn't require writing additional code for Lasso regression (option C) or using the What-If tool (option D).
upvoted 5 times
...
el_vampiro
Most Recent 2 months, 1 week ago
Selected Answer: D
Explainable AI tells you why a particular prediction was generated for a particular input; It doesn't discuss prediction power of columns in the model. For that, you need to use the What If tool. From https://cloud.google.com/blog/products/ai-machine-learning/introducing-the-what-if-tool-for-cloud-ai-platform-models : "Your test examples should include the ground truth labels so you can explore how different features impact your model’s predictions. "
upvoted 1 times
...
7cb0ab3
1 year, 1 month ago
Selected Answer: B
I went for B, but not sure why it is not D. Is it even possible to model time series with the What If tool?
upvoted 1 times
...
Mickey321
1 year, 6 months ago
Selected Answer: B
Option B
upvoted 3 times
...
PST21
1 year, 9 months ago
Selected Answer: B
to determine which customer attribute has the most predictive power for each prediction served by the model, you should use Vertex Explainable AI (Option B) with the 'explain' keyword to retrieve feature attributions using the sampled Shapley method. This will give you insights into feature importance at the individual prediction level, allowing you to understand the model's behavior for specific customers.
upvoted 3 times
...

Topic 1 Question 161

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 161 discussion

You are an ML engineer at a manufacturing company. You are creating a classification model for a predictive maintenance use case. You need to predict whether a crucial machine will fail in the next three days so that the repair crew has enough time to fix the machine before it breaks. Regular maintenance of the machine is relatively inexpensive, but a failure would be very costly. You have trained several binary classifiers to predict whether the machine will fail, where a prediction of 1 means that the ML model predicts a failure.

You are now evaluating each model on an evaluation dataset. You want to choose a model that prioritizes detection while ensuring that more than 50% of the maintenance jobs triggered by your model address an imminent machine failure. Which model should you choose?

  • A. The model with the highest area under the receiver operating characteristic curve (AUC ROC) and precision greater than 0.5
  • B. The model with the lowest root mean squared error (RMSE) and recall greater than 0.5.
  • C. The model with the highest recall where precision is greater than 0.5.
  • D. The model with the highest precision where recall is greater than 0.5.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
NamitSehgal
8 months, 3 weeks ago
Selected Answer: C
A (High AUC ROC, Precision > 0.5): AUC ROC is a good overall metric, but it doesn't directly address the specific priorities of this problem. A model with a high AUC might have a good balance of precision and recall on average, but it might not have the highest recall while maintaining the precision threshold. The focus here is on maximizing recall subject to the precision constraint.
upvoted 1 times
...
AzureDP900
1 year, 4 months ago
C. The model with the highest recall where precision is greater than 0.5. In this predictive maintenance use case, you want to prioritize detection (i.e., detecting imminent failures) while ensuring that most of the maintenance jobs triggered by your model address actual machine failures (i.e., true positives). Recall measures the proportion of actual failures detected by the model, which aligns with your goal of prioritizing detection.
upvoted 1 times
...
fitri001
1 year, 6 months ago
Selected Answer: C
Prioritizing Detection: Recall measures how well the model identifies true positives (correctly predicts failures). A high recall ensures most imminent failures are caught. Balancing with Precision: Precision measures how many of the predicted failures are true positives (avoiding unnecessary maintenance). The requirement of a precision greater than 0.5 ensures a reasonable number of triggered maintenances actually address failures.
upvoted 4 times
...
pinimichele01
1 year, 7 months ago
Selected Answer: C
went with C
upvoted 1 times
...
guilhermebutzke
1 year, 9 months ago
Selected Answer: C
Early detection of potential failures is crucial, even if it leads to some unnecessary maintenance ("false positives"). Therefore, we prioritize recall, which measures the ability to correctly identify true failures. While detection is important, we don't want to trigger too many unnecessary repairs ("false positives"). So, we set a minimum threshold of precision greater than 0.5, meaning at least 50% of triggered maintenance should address real failures.
upvoted 3 times
...
vfg
1 year, 10 months ago
Selected Answer: C
Priority is to detect(Pointing to Recall) and correctly detect (more that 50% - pointing to Precision)
upvoted 2 times
...

Topic 1 Question 162

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 162 discussion

You built a custom ML model using scikit-learn. Training time is taking longer than expected. You decide to migrate your model to Vertex AI Training, and you want to improve the model’s training time. What should you try out first?

  • A. Train your model in a distributed mode using multiple Compute Engine VMs.
  • B. Train your model using Vertex AI Training with CPUs.
  • C. Migrate your model to TensorFlow, and train it using Vertex AI Training.
  • D. Train your model using Vertex AI Training with GPUs.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
OpenKnowledge
3 weeks, 6 days ago
Selected Answer: B
Option B as the very 1st option to try out
upvoted 1 times
...
dija123
1 month ago
Selected Answer: D
B) Train using Vertex AI Training with CPUs ? This is likely what's already being done, just on a different platform!
upvoted 1 times
...
b7ad1d9
1 month, 3 weeks ago
Selected Answer: D
Scikit learn does have GPU support now!!
upvoted 2 times
...
desertlotus1211
8 months, 1 week ago
Selected Answer: A
Scikit-learn generally relies on CPU-based computations and does not natively leverage GPUs for most algorithms. Answer A is the best first step to improve training time without sacrificing model performance
upvoted 2 times
...
vini123
9 months, 1 week ago
Selected Answer: B
Minimal changes – You can quickly migrate your existing scikit-learn code to Vertex AI Training using CPU instances. ✅ Vertex AI prebuilt containers already support scikit-learn with CPU (no extra setup needed). ✅ Lower cost than distributed training or switching to another framework. ✅ Good for establishing a baseline – Once you see how long it takes on Vertex AI, you can decide if further optimization (like distributed training) is needed.
upvoted 1 times
...
lunalongo
11 months, 1 week ago
Selected Answer: B
B) The statement asks the FIRST STEP to take. Considering: - Scikit-learn's limited and non-universal GPU support - Higher cost associated with GPU instances The first sensible approach would indeed be to first migrate the model to Vertex AI using CPUs to establish a baseline training time. This allows for a direct comparison with the existing training setup and helps determine if the improvement from CPU to GPU is necessary.
upvoted 2 times
...
rajshiv
11 months, 1 week ago
Selected Answer: D
I think it is D. The optimal approach to improve training time in Vertex AI Training is to leverage the parallel processing power of GPUs.
upvoted 1 times
...
TanTran04
1 year, 4 months ago
Selected Answer: B
Scikit-learn is not intended to be used as a deep-learning framework and it does not provide any GPU support. (Ref: https://stackoverflow.com/questions/41567895/will-scikit-learn-utilize-gpu). So I go with B
upvoted 2 times
...
AzureDP900
1 year, 4 months ago
You decided to migrate to Vertex AI, If you have a model that requires significant computational resources and doesn't rely heavily on specialized GPU operations (like those in option D), then option B might still be a good choice. However, if your model is computationally intensive or involves complex neural network architectures I would go with D instead of B.
upvoted 1 times
...
AnnaR
1 year, 6 months ago
B is correct, because scikit only has CPU support for the following services: - prebuilt containers for custom training (this is the case here) - prebuilt containers for predictions and explanations - Vertex AI Pipelines - Vertex AI Workbench user-managed notebooks https://cloud.google.com/vertex-ai/docs/supported-frameworks-list#scikit-learn_2
upvoted 4 times
...
Carlose2108
1 year, 8 months ago
Selected Answer: B
scikit-learn no GPU support.
upvoted 1 times
...
guilhermebutzke
1 year, 9 months ago
Selected Answer: D
Scikit-learn doesn't natively support GPUs for training. However, many scikit-learn algorithms rely on libraries like NumPy and SciPy. These libraries can leverage GPUs if they're available on the system, potentially benefiting scikit-learn models indirectly.
upvoted 1 times
...
b1a8fae
1 year, 10 months ago
Selected Answer: B
SK-Learn offers no GPU support. Answer is B!
upvoted 3 times
...
VMHarry
1 year, 10 months ago
Selected Answer: D
GPU helps speeding up training process
upvoted 1 times
...
vale_76_na_xxx
1 year, 10 months ago
Why no A?
upvoted 2 times
...
mlx
1 year, 11 months ago
B. Train your model using Vertex AI Training with CPUs. No GPUs for ScikitLearn, but parrallelize/distribute training is a good way to increase model building
upvoted 2 times
...

Topic 1 Question 163

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 163 discussion

You are an ML engineer at a retail company. You have built a model that predicts a coupon to offer an ecommerce customer at checkout based on the items in their cart. When a customer goes to checkout, your serving pipeline, which is hosted on Google Cloud, joins the customer's existing cart with a row in a BigQuery table that contains the customers' historic purchase behavior and uses that as the model's input. The web team is reporting that your model is returning predictions too slowly to load the coupon offer with the rest of the web page. How should you speed up your model's predictions?

  • A. Attach an NVIDIA P100 GPU to your deployed model’s instance.
  • B. Use a low latency database for the customers’ historic purchase behavior.
  • C. Deploy your model to more instances behind a load balancer to distribute traffic.
  • D. Create a materialized view in BigQuery with the necessary data for predictions.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Begum
6 months ago
Selected Answer: D
Ans B : Not correct, We do not intend to change the architecture of the exisiting set up Answer: Introducing Materialised view is a better way to address the latency.
upvoted 2 times
...
desertlotus1211
8 months, 1 week ago
Selected Answer: B
You want to use Bigtable, Firestore, or Memorystore, or maybe ReDIS
upvoted 2 times
...
NamitSehgal
8 months, 3 weeks ago
Selected Answer: D
Materialized views directly address this bottleneck by pre-computing the join.
upvoted 3 times
...
vini123
9 months, 1 week ago
Selected Answer: B
If the primary issue is real-time access and speed, Option B is probably the better choice, as low-latency databases are built specifically for that purpose.
upvoted 1 times
...
DaleR
11 months, 1 week ago
Selected Answer: D
Keep everything in BigQuery. Migrating to a fast database is more complex and can potentially introduce challenges.
upvoted 3 times
rajshiv
11 months, 1 week ago
Agree. MV is better.
upvoted 1 times
...
...
f084277
12 months ago
Selected Answer: B
Unclear how an MV would help retrieve a single row any faster. Something like BigTable (a low latency database) would be much faster.
upvoted 3 times
...
inc_dev_ml_001
1 year, 3 months ago
Selected Answer: B
It says that you have to join the cart data, so you can't use the materialized view because it means that you should materialize the view every time a new cart shows up. So use a low latency DB it's the only way
upvoted 3 times
Sivaram06
10 months ago
but the cart data is already available in the Big Query. Hence choosing materialized view is a good option, as it can pre-compute the join between the customer's cart and their historical data in BigQuery, reducing the latency of data retrieval.
upvoted 1 times
...
...
inc_dev_ml_001
1 year, 4 months ago
Selected Answer: B
In my opinion the materialized view could be the best way but it says that the cart data have to join with historic behaviour so it's impossibile to have all the needed data for the prediction in the materialized view because cart data are not in the database.
upvoted 4 times
...
SausageMuffins
1 year, 5 months ago
Selected Answer: D
Both B and D in theory does reduce latency but B implies that we might need to migrate the database to another low latency database. This migration and setup might incur additional costs and effort. In contrast, creating a materialized view seems much more straight forward since there is already a preexisting big query table mentioned in the question.
upvoted 1 times
f084277
12 months ago
Sure, but the question asks about SPEED, not cost and effort
upvoted 3 times
...
...
Ria_1989
1 year, 6 months ago
Coupon to offer an ecommerce customer at checkout based on the items in their cart not the customer historic behaviour. That's creating confusion while choosing B.
upvoted 1 times
...
fitri001
1 year, 6 months ago
Selected Answer: D
Reduced Join Cost: Joining the customer's cart with their purchase history in BigQuery during each prediction can be slow. A materialized view pre-computes and stores the join results, eliminating the need for repetitive joins and significantly reducing latency. Targeted Data Access: Materialized views allow you to specify the exact columns needed for prediction, minimizing data transferred between BigQuery and your serving pipeline.
upvoted 2 times
pinimichele01
1 year, 6 months ago
https://cloud.google.com/architecture/minimizing-predictive-serving-latency-in-machine-learning#online_real-time_prediction i'm not sure that bq is the best option, what do you think?
upvoted 2 times
...
...
gscharly
1 year, 6 months ago
Selected Answer: B
https://cloud.google.com/architecture/minimizing-predictive-serving-latency-in-machine-learning#online_real-time_prediction "Analytical data stores such as BigQuery are not engineered for low-latency singleton read operations, where the result is a single row with many columns."
upvoted 4 times
...
guilhermebutzke
1 year, 8 months ago
Selected Answer: B
I changed my mind. B: Im read a lot this page https://cloud.google.com/architecture/minimizing-predictive-serving-latency-in-machine-learning#online_real-time_prediction If the web team is reporting that the model is returning predictions too slowly to load the coupon offer with the rest of the web page, it suggests that the bottleneck might indeed be in the inference process rather than in data retrieval or processing. Given that the model is deployed on Google Cloud, choosing a low-latency database makes it suitable for scenarios where quick access to data is crucial, such as real-time predictions for web applications. Option D: While pre-aggregating data in BigQuery can improve query speed, it might not be as efficient as a low-latency database for frequently accessed data like customer purchase history.
upvoted 4 times
...
guilhermebutzke
1 year, 9 months ago
Selected Answer: D
Firstly, I believe the correct choice should be B. This is supported by a comprehensive Google page discussing methods to minimize real-time prediction latency. In this resource, they don't mention using a BigQuery view but instead suggest precomputing and lookup approaches to minimize prediction time. https://cloud.google.com/architecture/minimizing-predictive-serving-latency-in-machine-learning#online_real-time_prediction However, I will stick with option D because it's not clear whether option B suggests changing the entire database or just utilizing it as a preliminary step for online prediction.
upvoted 1 times
guilhermebutzke
1 year, 8 months ago
I change for B
upvoted 1 times
...
...
sonicclasps
1 year, 9 months ago
Selected Answer: D
Queries that use materialized views are generally faster and consume fewer resources than queries that retrieve the same data only from the base tables. Materialized views can significantly improve the performance of workloads that have the characteristic of common and repeated queries.
upvoted 2 times
...
ddogg
1 year, 9 months ago
Selected Answer: D
D. Create a materialized view in BigQuery with the necessary data for predictions. Here's why: Current bottleneck: Joining the cart data with the BigQuery table containing historic purchases likely creates the latency bottleneck. Fetching data from BigQuery on every prediction request can be slow. Materialized view: A materialized view pre-computes and stores the join between the cart data and the relevant historic purchase information in BigQuery. This eliminates the need for real-time joins during prediction, significantly reducing latency. Faster access: The pre-computed data in the materialized view is readily available within BigQuery, ensuring faster access for your serving pipeline when predicting the coupon offer. Lower cost: Compared to additional instances or GPU resources, a materialized view can be a more cost-effective solution, especially if prediction requests are frequent.
upvoted 3 times
...
kalle_balle
1 year, 10 months ago
Selected Answer: B
Option B seems most sensible.
upvoted 1 times
...

Topic 1 Question 164

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 164 discussion

You work for a small company that has deployed an ML model with autoscaling on Vertex AI to serve online predictions in a production environment. The current model receives about 20 prediction requests per hour with an average response time of one second. You have retrained the same model on a new batch of data, and now you are canary testing it, sending ~10% of production traffic to the new model. During this canary test, you notice that prediction requests for your new model are taking between 30 and 180 seconds to complete. What should you do?

  • A. Submit a request to raise your project quota to ensure that multiple prediction services can run concurrently.
  • B. Turn off auto-scaling for the online prediction service of your new model. Use manual scaling with one node always available.
  • C. Remove your new model from the production environment. Compare the new model and existing model codes to identify the cause of the performance bottleneck.
  • D. Remove your new model from the production environment. For a short trial period, send all incoming prediction requests to BigQuery. Request batch predictions from your new model, and then use the Data Labeling Service to validate your model’s performance before promoting it to production.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
sonicclasps
Highly Voted 1 year, 9 months ago
Selected Answer: B
sounds to me that the new model has too few requests per hour and therefore scales downs to 0. Which means it has to create the an instance every time it serves a request, and this takes time. By manually setting the number of nodes, the nodes will always be running, whether or not they are serving predictions
upvoted 6 times
...
OpenKnowledge
Most Recent 1 month ago
Selected Answer: C
The current model receives about 20 prediction requests per hour with an average response time of one second -- this clearly indicates autoscaling is not an issue for current model although 20 requests per hour is not even close to use the full capacity for the full hours. So, it's look like the new model is taking 180 seconds to respond is not due to not having a node running all the time.
upvoted 1 times
...
billyst41
1 month, 3 weeks ago
Selected Answer: C
a Vertex AI online prediction endpoint cannot scale to zero instances when it is idle. The minimum number of replica nodes you can configure for a deployed model is one, which means at least one instance will always be running and incurring costs
upvoted 3 times
...
kirukkuman
4 months, 1 week ago
Selected Answer: C
The new model version is failing badly, with response times up to 180 seconds being unacceptable for an online service. The absolute first priority is to stop impacting users. Removing the new model from the canary test and routing all traffic back to the stable, existing version immediately mitigates the problem. Once the production environment is stable, you can begin your root cause analysis. The problem is a performance bottleneck, not an issue with scaling or infrastructure. Since the only thing that changed was the retrained model, the cause must lie within the new model artifact or its prediction code. Comparing the new version with the old is the most logical way to find what changed to cause the drastic slowdown.
upvoted 2 times
...
Begum
5 months, 3 weeks ago
Selected Answer: C
Need to check the performance bottlenecks
upvoted 1 times
...
desertlotus1211
8 months, 1 week ago
Selected Answer: C
You're performing 20 predictions an hour - so scaling isn’t the root issue. Code issue.
upvoted 2 times
...
vini123
9 months, 1 week ago
Selected Answer: B
Since the same model is being used and the only change is the data, it's likely that the latency issue is caused by how Vertex AI is scaling the prediction service.
upvoted 3 times
...
potomeek
10 months ago
Selected Answer: C
Removing the new model from production to debug and address the root cause of the latency issue is the most efficient and logical course of action. This ensures minimal disruption to production services and lays the groundwork for a smooth rollout after fixing the bottleneck
upvoted 1 times
...
YushiSato
1 year, 3 months ago
I don't see B as the right answer. The Vertex AI Endpoint cannot scale to 0 for newer version of the model. > When you configure a DeployedModel, you must set dedicatedResources.minReplicaCount to at least 1. In other words, you cannot configure the DeployedModel to scale to 0 prediction nodes when it is unused. https://cloud.google.com/vertex-ai/docs/general/deployment#scaling
upvoted 3 times
YushiSato
1 year, 3 months ago
I was convinced that the machines that are autoscaled by the Vertex AI Endpoint seem to be tied to the endpoint, not the model in which they are deployed.
upvoted 1 times
...
...
AnnaR
1 year, 6 months ago
Selected Answer: B
B can be effective in controlling resources available to the new model, ensuring that it is not delayed by the autoscaling trying to scale up from 0. Not A: there is no indication in the description that quota limits cause the slowdown and does not address issue where new model is performing poorly on canary testing. Not C : when you pull the new model from prod environment, you could affect end-user experience Not D: Same as C plus you rely on batch predictions which does not align with the need for online, real-time predictions in the prod environemnt. Data Labeling Service is more about assessing accuracy and less about resolving latency issues.
upvoted 3 times
...
pinimichele01
1 year, 7 months ago
Selected Answer: B
You have retrained the same model on a new batch of data
upvoted 2 times
pinimichele01
1 year, 6 months ago
the new model has too few requests per hour and therefore scales downs to 0. Which means it has to create the an instance every time it serves a request, and this takes time. By manually setting the number of nodes, the nodes will always be running, whether or not they are serving predictions
upvoted 5 times
...
...
VipinSingla
1 year, 7 months ago
Selected Answer: B
bottleneck seems to be start of node as there are very low number of requests so having one node always available will help in this case.
upvoted 1 times
...
Aastha_Vashist
1 year, 7 months ago
Selected Answer: C
went with c
upvoted 1 times
rajshiv
11 months, 1 week ago
I also think C. The model performance issue needs to be addressed.
upvoted 1 times
...
...
Carlose2108
1 year, 8 months ago
Selected Answer: C
I went C. Diagnosing the root cause.
upvoted 1 times
...
guilhermebutzke
1 year, 9 months ago
Selected Answer: C
Choose C. The significant increase in response time from 1 second to between 30 and 180 seconds indicates a performance issue with the new model. Before making any further changes or decisions, it's crucial to identify the root cause of this performance bottleneck. By comparing the code of the new model with the existing model, you can pinpoint any differences that might be causing the slowdown. In A, This may not be the root cause and could incur unnecessary costs without addressing the performance issue. In B,  it doesn't address the underlying issue causing the significant increase in response time observed during canary testing. in D, This would significantly increase latency and hinder real-time predictions, negatively impacting user experience.
upvoted 2 times
vaibavi
1 year, 9 months ago
But in the question it says "You have retrained the same model on a new batch of data" it's just the data that changed so no need to check for the code check.
upvoted 2 times
lunalongo
11 months, 1 week ago
B is still right because - Retraining often involves adjustments to hyperparameters or training processes. - Changes to data preprocessing steps (e.g., feature scaling, handling missing values) during retraining can change model code and affect model performance. - The retraining process itself might have introduced unknown bugs or inefficiencies into the model's deployment pipeline or the code that interacts with the model.
upvoted 1 times
...
...
...
b1a8fae
1 year, 10 months ago
Unsure on this one, but I would go with A. B. Turning off auto-scaling is a good measure when dealing with datasets with steep spikes of requests traffic (here we are dealing with avg. 20 request per hour) "The service may not be able to bring nodes online fast enough to keep up with large spikes of request traffic." https://cloud.google.com/blog/products/ai-machine-learning/scaling-machine-learning-predictions C. You retrain the SAME model on a different batch of data. It is implied that the code is the same too? D. Actual quality of the model is not in question here, but rather the long prediction time per request. Even if the requests traffic is very low, I can only consider option A: the selected quota cannot deal with the amount of concurrent prediction requests.
upvoted 1 times
...
kalle_balle
1 year, 10 months ago
Selected Answer: C
Option B or D is completely wrong. Option A to raise the quota might be necessary in some situations but doesn't necessarily deal with the performance issue at the test. Option C seems like the most suitable option.
upvoted 1 times
edoo
1 year, 8 months ago
You only retrained the same model, your code hasn't changed, you won't find anything with C. It's B.
upvoted 1 times
...
...

Topic 1 Question 165

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 165 discussion

You want to train an AutoML model to predict house prices by using a small public dataset stored in BigQuery. You need to prepare the data and want to use the simplest, most efficient approach. What should you do?

  • A. Write a query that preprocesses the data by using BigQuery and creates a new table. Create a Vertex AI managed dataset with the new table as the data source.
  • B. Use Dataflow to preprocess the data. Write the output in TFRecord format to a Cloud Storage bucket.
  • C. Write a query that preprocesses the data by using BigQuery. Export the query results as CSV files, and use those files to create a Vertex AI managed dataset.
  • D. Use a Vertex AI Workbench notebook instance to preprocess the data by using the pandas library. Export the data as CSV files, and use those files to create a Vertex AI managed dataset.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
vini123
9 months, 1 week ago
Selected Answer: A
BigQuery integration with Vertex AI: BigQuery is fully integrated with Vertex AI, which means you can directly use BigQuery as a data source for Vertex AI managed datasets. By writing a query to preprocess the data and then creating a Vertex AI managed dataset from that query, you can skip extra steps like exporting or converting data into different formats. This is both efficient and leverages the native capabilities of the GCP platform
upvoted 2 times
...
PhilipKoku
1 year, 5 months ago
Selected Answer: A
A) Keep the data in BigQuery and create a new table to avoid latency moving data out of BigQuery
upvoted 4 times
...
nmnm22
1 year, 5 months ago
Selected Answer: A
A seems the correct one
upvoted 1 times
...
gscharly
1 year, 6 months ago
Selected Answer: A
I go for A:
upvoted 1 times
...
shadz10
1 year, 10 months ago
Selected Answer: A
can export directly from big query as vertex ai managed dataset to use train an autoML model
upvoted 1 times
...
36bdc1e
1 year, 10 months ago
A By writing a query that preprocesses the data using BigQuery and creating a new table, you can directly create a Vertex AI managed dataset with the new table as the data source. This approach is efficient because it leverages BigQuery’s powerful data processing capabilities and avoids the need to export data to another format or service. It also simplifies the process by keeping everything within the Google Cloud ecosystem. This makes it easier to manage and monitor your data and model training process.
upvoted 3 times
...
vale_76_na_xxx
1 year, 10 months ago
I go for A:
upvoted 2 times
...
b1a8fae
1 year, 10 months ago
Selected Answer: A
Forgot to vote
upvoted 1 times
...
b1a8fae
1 year, 10 months ago
A seems the easiest to me: preprocess the data on BigQuery (where the input table is stored) and export directly as Vertex AI managed dataset.
upvoted 2 times
...
kalle_balle
1 year, 10 months ago
Selected Answer: B
Dataflow seems like the easiest and most scalable way to deal with this issue. Option B.
upvoted 1 times
pinimichele01
1 year, 6 months ago
small dataset -> no dataflow
upvoted 1 times
...
f084277
12 months ago
The data is already in BigQuery. Preprocess the data in BigQuery. How is Dataflow easier than BigQuery? (question doesn't mention anything about scalability)
upvoted 1 times
...
...

Topic 1 Question 166

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 166 discussion

You developed a Vertex AI ML pipeline that consists of preprocessing and training steps and each set of steps runs on a separate custom Docker image. Your organization uses GitHub and GitHub Actions as CI/CD to run unit and integration tests. You need to automate the model retraining workflow so that it can be initiated both manually and when a new version of the code is merged in the main branch. You want to minimize the steps required to build the workflow while also allowing for maximum flexibility. How should you configure the CI/CD workflow?

  • A. Trigger a Cloud Build workflow to run tests, build custom Docker images, push the images to Artifact Registry, and launch the pipeline in Vertex AI Pipelines.
  • B. Trigger GitHub Actions to run the tests, launch a job on Cloud Run to build custom Docker images, push the images to Artifact Registry, and launch the pipeline in Vertex AI Pipelines.
  • C. Trigger GitHub Actions to run the tests, build custom Docker images, push the images to Artifact Registry, and launch the pipeline in Vertex AI Pipelines.
  • D. Trigger GitHub Actions to run the tests, launch a Cloud Build workflow to build custom Docker images, push the images to Artifact Registry, and launch the pipeline in Vertex AI Pipelines.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
pikachu007
Highly Voted 1 year, 10 months ago
Selected Answer: C
Considering the goal of minimizing steps while allowing for flexibility, option C - "Trigger GitHub Actions to run the tests, build custom Docker images, push the images to Artifact Registry, and launch the pipeline in Vertex AI Pipelines" appears to be the most straightforward approach. It leverages GitHub Actions for testing and image building, then directly triggers the Vertex AI Pipelines, simplifying the workflow and reducing unnecessary services involved in the process.
upvoted 8 times
...
AnnaR
Highly Voted 1 year, 6 months ago
Selected Answer: D
Not A: does not leverage the integration capabilities of GitHub Actions with GitHub for initial testing, which is more efficient when managing repo triggers and workflows directly from Github. Not B: Cloud Run for running stateless containers, not for CI/CD tasks like building and pushing images Not C: building docker images directly in github Actions can encounter limits in terms of build performance and resource availability, esp. for complex images
upvoted 5 times
...
desertlotus1211
Most Recent 8 months, 1 week ago
Selected Answer: C
Since your team already uses GitHub and GitHub Actions as part of your CI/CD process (including running unit and integration tests), it’s most efficient to extend your existing workflow to also handle the packaging and deployment of your ML pipeline It avoids introducing an extra services like (i.e. Run and Build)...
upvoted 1 times
...
NamitSehgal
8 months, 3 weeks ago
Selected Answer: D
Cloud Build is a dedicated service for building container images.
upvoted 1 times
...
vini123
9 months, 1 week ago
Selected Answer: D
Option D is the most suitable answer because it maximizes flexibility, optimizes the image creation process, and integrates well with the Vertex AI Pipelines workflow.
upvoted 1 times
...
lunalongo
11 months, 1 week ago
Selected Answer: C
C) GitHub Actions can directly build the Docker images, push them to Artifact Registry, and then trigger the Vertex AI pipeline execution. *A&D) Add complexity by adding Cloud Build *B) Adds Cloud Run for building/pushing Docker images, but GitHub Actions do this. See how: https://medium.com/@sbkapelner/building-and-pushing-to-artifact-registry-with-github-actions-7027b3e443c1
upvoted 2 times
...
AB_C
11 months, 2 weeks ago
Selected Answer: D
Maximum flexibility needed. Hence D, not C
upvoted 1 times
...
bfdf9c8
1 year, 3 months ago
Selected Answer: A
The correct answer is a. I think is tricky because D is posible, but add one step. and we want to minimize the steps.
upvoted 2 times
...
AzureDP900
1 year, 4 months ago
option D might seem appealing at first, but it adds unnecessary complexity and makes it more challenging to manage the state of your pipeline. Option C, on the other hand, provides a simpler and more straightforward approach to automating your model retraining workflow using GitHub Actions.
upvoted 1 times
...
gscharly
1 year, 6 months ago
Selected Answer: D
agree with guilhermebutzke
upvoted 1 times
...
fitri001
1 year, 6 months ago
Selected Answer: D
Security: GitHub Actions are ideal for running unit and integration tests within the controlled environment of your GitHub repository. This keeps your test code separate from the production pipeline code running in Cloud Build. Scalability and Resource Management: Cloud Build is a managed service specifically designed for building container images in Google Cloud. It offers better resource management and scalability for building Docker images compared to Cloud Run, which is primarily designed for running stateless containers. Flexibility: This configuration allows for independent scaling of test execution (in GitHub Actions) and image building (in Cloud Build). You can modify the workflow files in each platform independently without affecting the other.
upvoted 2 times
fitri001
1 year, 6 months ago
A & B. Cloud Run for Image Building: While Cloud Run can build Docker images, it's not its primary function. Cloud Build is a more robust and scalable solution for container image building in Google Cloud. C. Building Images in GitHub Actions: GitHub Actions might have limitations on resource allocation and might not be suitable for building complex Docker images, especially if they have large dependencies.
upvoted 1 times
...
...
pinimichele01
1 year, 7 months ago
Selected Answer: D
i agree with guilhermebutzke
upvoted 1 times
...
guilhermebutzke
1 year, 9 months ago
Selected Answer: D
Choose D: GitHub Actions should be used to run tests and initiate the workflow upon code merges. Then, Cloud Build is a suitable service for building Docker images and handling the subsequent steps of pushing the images to Artifact Registry. So, Vertex AI Pipelines can be launched as part of the Cloud Build workflow for model retraining. In A Using Cloud Build directly from GitHub Actions would bypass GitHub Actions' capabilities for triggering and testing. In B, Cloud Run for building Docker images can introduce potential compatibility issues with Vertex AI Pipelines. In C,  Skipping Cloud Build for image building limits the workflow's portability and integration with Vertex AI. https://cloud.google.com/vertex-ai/docs/pipelines/introduction https://medium.com/@cait.ray13/serving-ml-model-using-google-pub-sub-python-f569c46e7eb0
upvoted 3 times
...
mindriddler
1 year, 9 months ago
Selected Answer: C
It has to be C. Therese no need to use both GH Actions and Cloud Build when GH Actions can do it all by itself
upvoted 2 times
...
shadz10
1 year, 9 months ago
Selected Answer: D
D https://cloud.google.com/build/docs/building/build-containers https://cloud.google.com/build/docs/build-push-docker-image
upvoted 3 times
...
36bdc1e
1 year, 10 months ago
The best approach would be Option C. By triggering GitHub Actions to run the tests, build custom Docker images, push the images to Artifact Registry, and launch the pipeline in Vertex AI Pipelines, you can automate the model retraining workflow. This approach allows for maximum flexibility and minimizes the steps required to build the workflow.
upvoted 2 times
...
b1a8fae
1 year, 10 months ago
Selected Answer: C
I am torn between C and D. GitHub actions to run the tests is definitely the simplest. Cloud Build allows to access fully managed CI/CD workflow (you could setup the Docker build job), but I figure it would be easier to do it from GitHub actions directly (https://docs.github.com/en/actions/creating-actions/creating-a-docker-container-action) which allows you to use 1 tool less and achieve the same result.
upvoted 2 times
...

Topic 1 Question 167

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 167 discussion

You are working with a dataset that contains customer transactions. You need to build an ML model to predict customer purchase behavior. You plan to develop the model in BigQuery ML, and export it to Cloud Storage for online prediction. You notice that the input data contains a few categorical features, including product category and payment method. You want to deploy the model as quickly as possible. What should you do?

  • A. Use the TRANSFORM clause with the ML.ONE_HOT_ENCODER function on the categorical features at model creation and select the categorical and non-categorical features.
  • B. Use the ML.ONE_HOT_ENCODER function on the categorical features and select the encoded categorical features and non-categorical features as inputs to create your model.
  • C. Use the CREATE MODEL statement and select the categorical and non-categorical features.
  • D. Use the ML.MULTI_HOT_ENCODER function on the categorical features, and select the encoded categorical features and non-categorical features as inputs to create your model.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
BlehMaks
Highly Voted 1 year, 10 months ago
Selected Answer: B
When the TRANSFORM clause is present, only output columns from the TRANSFORM clause are used in training. Any results from query_statement that don't appear in the TRANSFORM clause are ignored. https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create#transform so if you want TRANSFORM then use TRANSFORM for both categorical and non-categorical features
upvoted 5 times
...
b7ad1d9
Most Recent 1 month, 2 weeks ago
Selected Answer: C
BQML automatically creates one hot encoding as part of the process
upvoted 1 times
...
Begum
6 months ago
Selected Answer: B
Option C: May be correct but , CREATE model statetment has many options to consider while some are default others need to be specified explicitly. Just Create model is incomplete answer. However B is more precise to use one_hot_endcoding hence B
upvoted 1 times
...
vini123
9 months, 1 week ago
Selected Answer: B
ML.ONE_HOT_ENCODER transforms the categorical features into one-hot encoded values. You then select these encoded categorical features along with the non-categorical features to create your model. This is the most common approach for handling categorical features in BigQuery ML for fast deployment.
upvoted 3 times
...
potomeek
10 months ago
Selected Answer: C
Using the CREATE MODEL statement with the categorical and non-categorical features directly (Option C) is the simplest, fastest, and most effective way to build and deploy your model in BigQuery ML
upvoted 3 times
...
0e6b9e2
10 months, 2 weeks ago
Selected Answer: C
The create_model statement automatically one-hot encodes categorical features. https://cloud.google.com/bigquery/docs/auto-preprocessing This may not be the best solution in terms of transparency, but the question asked for the "fastest" solution
upvoted 2 times
...
phani49
10 months, 3 weeks ago
Selected Answer: C
BigQuery ML automatically handles categorical features. When you use the CREATE MODEL statement, it recognizes categorical columns and applies appropriate encoding (e.g., one-hot encoding or embeddings) under the hood.
upvoted 3 times
...
YushiSato
1 year, 3 months ago
Selected Answer: A
TRANSFORM is used to transform the input for both learning and inference. ONE_HOT_ENCODER can also be used within TRANSFORM. The other options require conversion on the input in prediction. A is correct.
upvoted 1 times
YushiSato
1 year, 3 months ago
Sorry, BlehMaks is correct. In this case, we don't use TRANSFORM, we need to do the conversion in the forecast as well.
upvoted 1 times
...
...
bobjr
1 year, 5 months ago
Selected Answer: A
CREATE OR REPLACE MODEL `project.dataset.model_name` OPTIONS(model_type='logistic_reg') AS SELECT *, TRANSFORM( product_category, payment_method USING ML.ONE_HOT_ENCODER(product_category) AS encoded_product_category, ML.ONE_HOT_ENCODER(payment_method) AS encoded_payment_method ) FROM `project.dataset.table_name`;
upvoted 3 times
...
pikachu007
1 year, 10 months ago
Selected Answer: B
Given the goal of quickly deploying the model for predicting customer purchase behavior while handling categorical features, option B - "Use the ML.ONE_HOT_ENCODER function on the categorical features and select the encoded categorical features and non-categorical features as inputs to create your model" seems to be the most appropriate. This approach directly handles the encoding of categorical features using one-hot encoding and selects the necessary features for model creation, ensuring efficient utilization of categorical data in the BigQuery ML model.
upvoted 1 times
...
b1a8fae
1 year, 10 months ago
Selected Answer: B
Only B and D make sense. Between the two, after reading the use case of multi-hot encoding (https://cloud.google.com/bigquery/docs/auto-preprocessing#feature-transform), I would tend towards B, since one-hot encoding is preferred over in case of using non-numerical, non-array features (product category and payment methods are often respresented as such); multi-hot encoding is preferred in case of non-numerical, array features, which is not the case here.
upvoted 1 times
b1a8fae
1 year, 10 months ago
Also I understand it cannot be A because it says "take the categorical features" as opposed to the more specific "take the encoded categorical features" in B
upvoted 1 times
...
...

Topic 1 Question 168

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 168 discussion

You need to develop an image classification model by using a large dataset that contains labeled images in a Cloud Storage bucket. What should you do?

  • A. Use Vertex AI Pipelines with the Kubeflow Pipelines SDK to create a pipeline that reads the images from Cloud Storage and trains the model.
  • B. Use Vertex AI Pipelines with TensorFlow Extended (TFX) to create a pipeline that reads the images from Cloud Storage and trains the model.
  • C. Import the labeled images as a managed dataset in Vertex AI and use AutoML to train the model.
  • D. Convert the image dataset to a tabular format using Dataflow Load the data into BigQuery and use BigQuery ML to train the model.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
OpenKnowledge
3 weeks, 6 days ago
Selected Answer: B
The problem haven't specified how large the dataset is. However, If by large dataset, it indicates to more than 200M rows, then AutoML has limitations on such Large dataset. So, considering the vagueness of this problem, option B is safer choice
upvoted 1 times
...
NamitSehgal
8 months, 3 weeks ago
Selected Answer: C
leverage Vertex AI's AutoML capabilities to automatically build a high-quality image classification model
upvoted 1 times
...
sekhrivijay
10 months ago
Selected Answer: B
Managed dataset has a size limitation of 100GB . Question states " a large dataset " . Unmanged dataset has not size limitation . Assuming large here implies > 100GB , it should eliminate answer C
upvoted 3 times
...
f084277
12 months ago
Selected Answer: C
You're just trying to TRAIN A MODEL, not set up a whole pipeline. Answer is clearly C
upvoted 2 times
...
AzureDP900
1 year, 4 months ago
B is right in my opinion, while both options C and B involve importing labeled images into Vertex AI, using AutoML for image classification might not be the most suitable choice. TFX is a more specialized tool that provides a robust pipeline framework specifically designed for image classification tasks, making it a better fit for this particular use case.
upvoted 1 times
...
pinimichele01
1 year, 7 months ago
Selected Answer: C
https://cloud.google.com/vertex-ai/docs/tutorials/image-classification-automl/dataset
upvoted 1 times
pinimichele01
1 year, 6 months ago
no need to use a pipeline, automl is ok
upvoted 1 times
...
...
guilhermebutzke
1 year, 9 months ago
Selected Answer: B
My answer: B TensorFlow Extended (TFX) and Kubeflow provide capabilities for building machine learning pipelines that can handle data stored in Google Cloud Storage (GCS). However, when it comes to ease of use specifically for working with data in GCS, TFX may have a slight edge over Kubeflow for 1- Integration with GCS- TensorFlow: TFX is tightly integrated with TensorFlow that has built-in support for GCS and provides convenient APIs for reading data directly from GCS buckets 2 - Abstraction of Data Handling TFX provides higher-level abstractions and components specifically designed for common machine learning tasks, including data preprocessing, model training, and model evaluation
upvoted 4 times
pinimichele01
1 year, 6 months ago
Which SDK use? • If you use TensorFlow in an ML workflow that processes terabytes of structured data or text data -> TFX • For other use-cases -> KFP
upvoted 2 times
...
...
winston9
1 year, 10 months ago
Selected Answer: C
It's C
upvoted 3 times
...
BlehMaks
1 year, 10 months ago
Selected Answer: A
95th is the similar question. https://cloud.google.com/vertex-ai/docs/pipelines/build-pipeline#sdk
upvoted 1 times
winston9
1 year, 9 months ago
95 is a similar question but it does not offer Vertex AI AutoML as an option. which I think it's the right answer here consider the little amount of info provided in the question
upvoted 1 times
...
...
b1a8fae
1 year, 10 months ago
Selected Answer: C
Very vaguely put. I choose C over B just because it sounds like a simpler approach, but both should theoretically work.
upvoted 2 times
...

Topic 1 Question 169

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 169 discussion

You are developing a model to detect fraudulent credit card transactions. You need to prioritize detection, because missing even one fraudulent transaction could severely impact the credit card holder. You used AutoML to tram a model on users' profile information and credit card transaction data After training the initial model, you notice that the model is failing to detect many fraudulent transactions. How should you adjust the training parameters in AutoML to improve model performance? (Choose two.)

  • A. Increase the score threshold
  • B. Decrease the score threshold.
  • C. Add more positive examples to the training set
  • D. Add more negative examples to the training set
  • E. Reduce the maximum number of node hours for training
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
tardigradum
Highly Voted 1 year, 3 months ago
Selected Answer: B
B&C If we want to increase the detection rate of fraudulent transactions, we can lower the classification threshold. By doing so, the model becomes less strict and classifies more transactions as potentially fraudulent. This implies including a higher number of false positives in our results. To improve the performance, we can also add more fradulent transactions examples to the dataset (fraudulent transactions are the positivies, in this case)
upvoted 5 times
...
NamitSehgal
Most Recent 8 months, 3 weeks ago
Selected Answer: B
B. Decrease the score threshold and C. Add more positive examples to the training set.
upvoted 1 times
...
fitri001
1 year, 6 months ago
Selected Answer: B
B & D D. Add more negative examples to the training set: Fraudulent transactions are typically a minority compared to legitimate transactions. By increasing the number of negative examples (fraudulent transactions) in your training data, you provide AutoML with more information about the patterns of fraudulent activity. This can help the model better distinguish between legitimate and fraudulent transactions. B. Decrease the score threshold: The score threshold determines the level of suspicion assigned to a transaction by the model. A lower threshold means the model flags more transactions as suspicious, potentially catching more fraudulent activities. However, this might also lead to an increase in false positives (flagging legitimate transactions). You'll need to find a balance between fraud detection and acceptable false positive rates based on your business needs.
upvoted 3 times
fitri001
1 year, 6 months ago
A. Increase the score threshold: This would make the model more conservative and less likely to flag fraudulent transactions, potentially missing actual fraud. C. Add more positive examples (legitimate transactions): While having a balanced dataset is important, in this case, prioritizing fraud detection suggests focusing on improving the model's ability to identify fraudulent transactions (negative examples) rather than adding more legitimate ones. E. Reduce the maximum number of node hours for training: Reducing training time might limit the model's ability to learn complex patterns, potentially hindering its performance.
upvoted 1 times
...
pinimichele01
1 year, 6 months ago
positive is fraudulent.. aka minority class
upvoted 3 times
...
tardigradum
1 year, 3 months ago
Positive is fraudulent in this case, so B & C
upvoted 2 times
...
...
shadz10
1 year, 10 months ago
B&C - Fraudulent transactions are often rare events, so the model might not have enough exposure to learn their patterns effectively.
upvoted 2 times
...
36bdc1e
1 year, 10 months ago
B & C They are the options
upvoted 2 times
...
BlehMaks
1 year, 10 months ago
Selected Answer: C
BC B. More suspicious transactions are marked as fraudulent C. Usually real fraudulent transactions are rare in datasets so we need to add more examples to make our model focus more on them
upvoted 3 times
...
pikachu007
1 year, 10 months ago
Selected Answer: B
B & D B. Decrease the score threshold: This adjustment could make the model more sensitive, potentially reducing the chance of missing fraudulent transactions, but might increase false positives. D. Add more negative examples to the training set: Providing more examples of non-fraudulent transactions could help the model better distinguish between legitimate and fraudulent transactions, improving its overall performance.
upvoted 2 times
tavva_prudhvi
1 year, 6 months ago
Option D's approach could be beneficial in a scenario where the model is overfitting to the fraudulent (positive) cases due to an imbalance in the training data favoring fraudulent examples. But, as per the question "model is failing to detect many fraudulent transactions"
upvoted 1 times
...
...
b1a8fae
1 year, 10 months ago
Selected Answer: C
Regarding the 2nd choice (did not notice), I would choose C: adding more positive examples to the training set. It did not sound like a change of parameter to me, but apparently AutoML allows parametrization of data split: https://cloud.google.com/vertex-ai/docs/general/ml-use. I am not entirely convinced but it seems more likely than any other option (reducing max number of hours per node for training can only affect performance negatively I reckon?)
upvoted 2 times
...
b1a8fae
1 year, 10 months ago
Selected Answer: B
B. Decreasing the score threshold will cause the model to make more positive predictions and potentially decrease the number of false negatives (non detected fraudulent transactions)
upvoted 2 times
...

Topic 1 Question 170

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 170 discussion

You need to deploy a scikit-leam classification model to production. The model must be able to serve requests 24/7, and you expect millions of requests per second to the production application from 8 am to 7 pm. You need to minimize the cost of deployment. What should you do?

  • A. Deploy an online Vertex AI prediction endpoint. Set the max replica count to 1
  • B. Deploy an online Vertex AI prediction endpoint. Set the max replica count to 100
  • C. Deploy an online Vertex AI prediction endpoint with one GPU per replica. Set the max replica count to 1
  • D. Deploy an online Vertex AI prediction endpoint with one GPU per replica. Set the max replica count to 100
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
pikachu007
Highly Voted 1 year, 10 months ago
Selected Answer: B
B. Deploy an online Vertex AI prediction endpoint. Set the max replica count to 100: This option provides a higher number of replicas (100) to handle the expected high volume of requests during peak hours. While it might result in increased costs, it provides the necessary scalability to manage the incoming traffic efficiently. During non-peak hours, you can consider scaling down the replicas to reduce costs, as Vertex AI allows dynamic scaling based on demand.
upvoted 6 times
...
AzureDP900
Most Recent 1 year, 4 months ago
Option A (Deploying an online Vertex AI prediction endpoint. Set the max replica count to 1) is still a good choice for minimizing costs. By setting the max replica count to 1, you are allowing Vertex AI to scale up or down based on load, which means that during off-peak hours, you won't be paying for unnecessary instances.
upvoted 1 times
...
pinimichele01
1 year, 7 months ago
Selected Answer: B
see pikachu007
upvoted 1 times
...
36bdc1e
1 year, 10 months ago
B we don't need GPU for scikit-learn
upvoted 2 times
...
BlehMaks
1 year, 10 months ago
Selected Answer: B
scikit-learn doesn't support GPU https://scikit-learn.org/stable/faq.html#will-you-add-gpu-support
upvoted 4 times
...
b1a8fae
1 year, 10 months ago
B. scikit-learn -> no need for GPU max number of replicas -> 1 is too little if we are serving online predictions at such a massive scale (millions per second)
upvoted 2 times
...

Topic 1 Question 171

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 171 discussion

You work with a team of researchers to develop state-of-the-art algorithms for financial analysis. Your team develops and debugs complex models in TensorFlow. You want to maintain the ease of debugging while also reducing the model training time. How should you set up your training environment?

  • A. Configure a v3-8 TPU VM. SSH into the VM to train and debug the model.
  • B. Configure a v3-8 TPU node. Use Cloud Shell to SSH into the Host VM to train and debug the model.
  • C. Configure a n1 -standard-4 VM with 4 NVIDIA P100 GPUs. SSH into the VM and use ParameterServerStraregv to train the model.
  • D. Configure a n1-standard-4 VM with 4 NVIDIA P100 GPUs. SSH into the VM and use MultiWorkerMirroredStrategy to train the model.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
pikachu007
Highly Voted 1 year, 10 months ago
Selected Answer: D
Given the need to balance ease of debugging and reduce training time for complex models in TensorFlow, option D - "Configure an n1-standard-4 VM with 4 NVIDIA P100 GPUs. SSH into the VM and use MultiWorkerMirroredStrategy to train the model" appears to be more suitable. This setup utilizes NVIDIA P100 GPUs for computational power and employs MultiWorkerMirroredStrategy, which can distribute the workload across GPUs efficiently, potentially reducing training time while maintaining a relatively straightforward environment for debugging.
upvoted 5 times
...
Rafa1312
Most Recent 3 weeks, 3 days ago
Selected Answer: A
I will go with A. This problem is perfectly suited for TPU. this is much easier option
upvoted 1 times
...
5091a99
8 months, 1 week ago
Selected Answer: D
Answer D. - GPUs are more accurate at complex numerical calculations than TPUs. - MultiWorkerMirroredStrategy will train on multiple machines.
upvoted 3 times
...
NamitSehgal
8 months, 3 weeks ago
Selected Answer: A
complex TensorFlow models use TPUs
upvoted 2 times
...
JDpmle2024
1 year ago
How would D be correct: D. Configure a n1-standard-4 VM with 4 NVIDIA P100 GPUs. SSH into the VM and use MultiWorkerMirroredStrategy to train the model. This is a single VM. The MultiWorkerMirroredStrategy is for multiple VMs. Based on this, choosing A.
upvoted 1 times
...
baimus
1 year, 2 months ago
MultiWorkerMirroredStrategy is for multiple workers, each with one or more GPUs. For a single worker/vm with multiple GPUs it would be MirroredStrategy, so D is definitely wrong. C is wrong as that is a totally unrelated concept, B is probably wrong as it's much less convenient than using a terminal (B vs A is tough call, but A replicates their existing setup most closely)
upvoted 3 times
...
AzureDP900
1 year, 4 months ago
Option D Configure a n1-standard-4 VM with 4 NVIDIA P100 GPUs. SSH into the VM and use MultiWorkerMirroredStrategy to train the model. is indeed a correct answer. MultiWorkerMirroredStrategy: This strategy allows you to distribute your training process across multiple machines (in this case, the 4 NVIDIA P100 GPUs) while maintaining synchronization between them. NVIDIA P100 GPUs: These high-performance GPUs are well-suited for computationally intensive tasks like deep learning model training.
upvoted 2 times
...
inc_dev_ml_001
1 year, 4 months ago
Selected Answer: A
It says "state-of-art" and TPU is more recent than GPU. No need to log using Cloud Shell into VM and there's no mention about cost. So TPU + SSH directly into VM could be the choice.
upvoted 4 times
...
fitri001
1 year, 6 months ago
Selected Answer: D
Debugging Ease: SSHing into a VM provides a familiar environment for researchers to use familiar debugging tools within the VM for their complex TensorFlow models. This maintains ease of debugging compared to TPUs which require special considerations. Faster Training: Utilizing 4 NVIDIA P100 GPUs within the VM leverages parallel processing capabilities to significantly accelerate training compared to a CPU-only VM.
upvoted 3 times
...
pinimichele01
1 year, 7 months ago
Selected Answer: D
the need to balance ease of debugging and reduce training time
upvoted 2 times
...
guilhermebutzke
1 year, 9 months ago
Selected Answer: D
My choice is D. While TPUs offer faster training, they can be less convenient for debugging due to limitations in tooling and visualization, such as the lack of support for some debuggers and limited visualization options. Comparing options C and D, MultiWorkerMirroredStrategy uses synchronous distributed training across multiple workers, making it easier to inspect intermediate states and variables during debugging. In contrast, ParameterServerStraregv utilizes asynchronous multi-machine training, which can be less intuitive to debug. However, it's important to note that ParameterServerStraregv might be more efficient for training extremely large models. Therefore, considering the specific need for ease of debugging in this scenario, MultiWorkerMirroredStrategy appears to be the more suitable choice.
upvoted 3 times
...
b1a8fae
1 year, 10 months ago
Selected Answer: D
D. Cannot be B, because node architecture make it difficult to debug: https://cloud.google.com/tpu/docs/system-architecture-tpu-vm#tpu-node-arch While TPUs are faster than GPUs for certain scenarios, and never slower, they are less easy to debug. Parallelizing the training across different workers (GPUs) using MultiWorkerMirroredStrategy makes most sense to me.
upvoted 1 times
...

Topic 1 Question 172

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 172 discussion

You created an ML pipeline with multiple input parameters. You want to investigate the tradeoffs between different parameter combinations. The parameter options are
• Input dataset
• Max tree depth of the boosted tree regressor
• Optimizer learning rate

You need to compare the pipeline performance of the different parameter combinations measured in F1 score, time to train, and model complexity. You want your approach to be reproducible, and track all pipeline runs on the same platform. What should you do?

  • A. 1. Use BigQueryML to create a boosted tree regressor, and use the hyperparameter tuning capability.
    2. Configure the hyperparameter syntax to select different input datasets: max tree depths, and optimizer learning rates. Choose the grid search option.
  • B. 1. Create a Vertex AI pipeline with a custom model training job as part of the pipeline. Configure the pipeline’s parameters to include those you are investigating.
    2. In the custom training step, use the Bayesian optimization method with F1 score as the target to maximize.
  • C. 1. Create a Vertex AI Workbench notebook for each of the different input datasets.
    2. In each notebook, run different local training jobs with different combinations of the max tree depth and optimizer learning rate parameters.
    3. After each notebook finishes, append the results to a BigQuery table.
  • D. 1. Create an experiment in Vertex AI Experiments.
    2. Create a Vertex AI pipeline with a custom model training job as part of the pipeline. Configure the pipeline’s parameters to include those you are investigating.
    3. Submit multiple runs to the same experiment, using different values for the parameters.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
fitri001
1 year ago
Selected Answer: D
Vertex AI Experiments: This service allows you to group and track different pipeline runs associated with the same experiment. This facilitates comparing runs with various parameter combinations. Vertex AI Pipelines: Pipelines enable you to define a workflow for training your model. You can include a custom training step within the pipeline and configure its parameters as needed. This ensures reproducibility as all runs follow the same defined workflow. Submitting multiple runs: By submitting multiple pipeline runs to the same experiment with different parameter values, you can efficiently explore various configurations and track their performance metrics like F1 score, training time, and model complexity within Vertex AI Experiments.
upvoted 4 times
fitri001
1 year ago
A. BigQuery ML: BigQuery ML doesn't offer functionalities like Vertex AI Pipelines for building and managing workflows. It also lacks experiment tracking capabilities. C. Vertex AI Workbench notebooks: While Vertex AI Workbench provides notebooks for running training jobs, this approach wouldn't be reproducible. Each notebook would be a separate entity, making it difficult to track runs and manage different parameter combinations.
upvoted 2 times
...
...
pinimichele01
1 year, 1 month ago
Selected Answer: D
Vertex AI Experiment was created to compare runs.
upvoted 1 times
...
36bdc1e
1 year, 4 months ago
D The best option for investigating the tradeoffs between different parameter combinations is to create an experiment in Vertex AI Experiments,
upvoted 2 times
...
BlehMaks
1 year, 4 months ago
Selected Answer: D
Vertex AI Experiment was created to compare runs. A is incorrect because you can't create a boosted tree using BigQueryML https://cloud.google.com/bigquery/docs/bqml-introduction#supported_models
upvoted 1 times
...
pikachu007
1 year, 4 months ago
Selected Answer: D
Given the objective of investigating parameter tradeoffs while ensuring reproducibility and tracking, option D - "Create an experiment in Vertex AI Experiments and submit multiple runs to the same experiment, using different values for the parameters" seems to be the most suitable. This approach provides a structured and trackable environment within Vertex AI Experiments, allowing multiple runs with varied parameters to be monitored for F1 score, training times, and potentially model complexity, enabling a comprehensive analysis of parameter combinations' tradeoffs.
upvoted 1 times
...
vale_76_na_xxx
1 year, 4 months ago
I go with D : https://cloud.google.com/vertex-ai/docs/evaluation/introduction#tabular
upvoted 1 times
...
b1a8fae
1 year, 4 months ago
Selected Answer: D
You want to investigate tradeoffs between different parameter combinations and track all runs on the same platform -> clearly D. Vertex AI experiments etcetera.
upvoted 1 times
...

Topic 1 Question 173

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 173 discussion

You received a training-serving skew alert from a Vertex AI Model Monitoring job running in production. You retrained the model with more recent training data, and deployed it back to the Vertex AI endpoint, but you are still receiving the same alert. What should you do?

  • A. Update the model monitoring job to use a lower sampling rate.
  • B. Update the model monitoring job to use the more recent training data that was used to retrain the model.
  • C. Temporarily disable the alert. Enable the alert again after a sufficient amount of new production traffic has passed through the Vertex AI endpoint.
  • D. Temporarily disable the alert until the model can be retrained again on newer training data. Retrain the model again after a sufficient amount of new production traffic has passed through the Vertex AI endpoint.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
OpenKnowledge
1 month, 3 weeks ago
Selected Answer: B
To update model monitoring skew alerts after a model retraining, you should reconfigure the existing monitoring job to use the new training data as its reference dataset, which is crucial for accurate training-serving skew detection. On platforms like Google Cloud's Vertex AI, this typically involves editing the model's endpoint settings or creating a new monitoring job to point to the retrained model and its corresponding training data. After updating, you'll need to monitor the new alerts to confirm the retraining has resolved the original skew issue.
upvoted 1 times
...
8619d79
9 months ago
Selected Answer: D
This approach acknowledges that the skew might be due to a mismatch between the training data and the current production data distribution. By waiting for sufficient new production traffic, you can collect a more representative dataset that reflects the current state of the production environment. Retraining the model on this new data ensures that the model is better aligned with the production data distribution, which should resolve the skew.
upvoted 1 times
...
lunalongo
11 months, 1 week ago
Selected Answer: C
C) - The model is adapting to the changing data distribution in production - Disabling alerts temporarily gives model a chance to adjust to new data *A) would hide/mask the skew; B) doesn't make sense because the monitoring job already uses the most recently trained data, it's just different from production data; D) is reactive and short term solution
upvoted 1 times
...
AzureDP900
1 year, 4 months ago
C. Temporarily disable the alert. Enable the alert again after a sufficient amount of new production traffic has passed through the Vertex AI endpoint. Here's why: You've already retrained the model with more recent training data and deployed it back to the Vertex AI endpoint, but the alert persists. This suggests that the model is still adapting to the changing data distribution in production. Temporarily disabling the alert will give the model a chance to adjust to the new data distribution before the monitoring job starts firing alerts again. Once enough new traffic has passed through, you can re-enable the alert and continue monitoring the model's performance.
upvoted 1 times
...
info_appsatori
1 year, 4 months ago
Selected Answer: B
The baseline is calculated when you create a Vertex AI Model Monitoring job, and is only recalculated if you update the training dataset for the job.
upvoted 3 times
...
SahandJ
1 year, 6 months ago
Is B actually the correct answer? According to the documentation, training-serving skew detection can only be enabled if the original training data is available. Furthermore, the baseline is automatically recalculated when the training data is updated. So does this question imply that the model is trained on data without updating the original training-dataset? If so then B is clearly correct. If they updated the training dataset with new data and then retrained the model then the model monitoring job's baseline should automatically have been recalculated. I see no other valid answers in that case?
upvoted 2 times
...
pinimichele01
1 year, 7 months ago
Selected Answer: B
This option can help align the baseline distribution of the model monitoring job with the current distribution of the production data, and eliminate the false positive alerts.
upvoted 1 times
...
36bdc1e
1 year, 10 months ago
B This option can help align the baseline distribution of the model monitoring job with the current distribution of the production data, and eliminate the false positive alerts.
upvoted 3 times
...
BlehMaks
1 year, 10 months ago
Selected Answer: B
the cause of the issue could be that the developer forgot to switch their monitoring job to the latest training dataset and the monitoring job still compares prod data with old training dataset and they of course have a skew
upvoted 3 times
...
pikachu007
1 year, 10 months ago
Selected Answer: B
B. Update the model monitoring job to use the more recent training data that was used to retrain the model: This option directly aligns the model monitoring with the recently retrained model and ensures that the monitoring job reflects the characteristics of the latest training data.
upvoted 1 times
...
b1a8fae
1 year, 10 months ago
Selected Answer: D
A. Changing the sampling rate affects not training skew but cost efficiency: https://cloud.google.com/vertex-ai/docs/model-monitoring/overview#considerations B. The model monitoring job is already using the most recent data to detect skew. C&D are the same, except for D being more specific, so I would tend towards D.
upvoted 1 times
...

Topic 1 Question 174

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 174 discussion

You developed a custom model by using Vertex AI to forecast the sales of your company’s products based on historical transactional data. You anticipate changes in the feature distributions and the correlations between the features in the near future. You also expect to receive a large volume of prediction requests. You plan to use Vertex AI Model Monitoring for drift detection and you want to minimize the cost. What should you do?

  • A. Use the features for monitoring. Set a monitoring-frequency value that is higher than the default.
  • B. Use the features for monitoring. Set a prediction-sampling-rate value that is closer to 1 than 0.
  • C. Use the features and the feature attributions for monitoring. Set a monitoring-frequency value that is lower than the default.
  • D. Use the features and the feature attributions for monitoring. Set a prediction-sampling-rate value that is closer to 0 than 1.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
fitri001
Highly Voted 1 year ago
Selected Answer: D
Feature and Feature Attribution Monitoring: Since you anticipate changes in feature distributions and correlations, monitoring both features and their attributions provides a more comprehensive view of potential drift. Feature attributions explain how each feature contributes to the model's predictions. Monitoring them helps identify if these contributions are changing as expected. Lower Prediction Sampling Rate: This reduces the cost associated with Vertex AI Model Monitoring. The sampling rate determines the percentage of prediction requests used for monitoring calculations. A lower rate reduces the number of predictions analyzed, lowering monitoring costs. However, it's important to strike a balance between cost and having enough data for drift detection.
upvoted 6 times
...
OpenKnowledge
Most Recent 1 month, 3 weeks ago
Selected Answer: D
In the context of machine learning, a prediction sampling rate refers to the percentage of incoming requests to a deployed model that are logged and analyzed for monitoring purposes. It is used to balance the need for performance monitoring with the cost and computational overhead of analyzing every prediction. For example, on Google Cloud's Vertex AI, you can set a prediction-sampling-rate for a model monitoring job. If you set the rate to 0.5, only 50% of the prediction requests will be logged and analyzed for potential issues like data drift or skew. It is often not necessary or cost-effective to monitor every single prediction request, especially for high-traffic models. A sampling rate allows you to monitor a statistically representative subset of the data.
upvoted 1 times
...
BlehMaks
1 year, 4 months ago
Selected Answer: D
if we expect a large volume of prediction requests then pick D. if we expect the changes to be infrequent then C https://cloud.google.com/vertex-ai/docs/model-monitoring/overview#considerations
upvoted 3 times
...
pikachu007
1 year, 4 months ago
Selected Answer: D
Given the need to minimize costs while addressing changes in feature distributions and correlations, option D - "Use the features and the feature attributions for monitoring. Set a prediction-sampling-rate value that is closer to 0 than 1" seems to be a reasonable choice. This option allows monitoring both features and feature attributions, offering insights into changes in feature importance, while the lower prediction-sampling-rate helps manage costs by monitoring a subset of predictions. It's a trade-off between cost efficiency and the need for effective drift detection
upvoted 2 times
...
b1a8fae
1 year, 4 months ago
Selected Answer: D
Not A. because higher monitoring frequency, higher cost. Not B. because higher prediction request sample rate, higher cost. Between the remaining 2, better to lower the prediction request sample rate so only a small fraction of the latest data is evaluated for drift, also because lots of data are expected so a small perecentage should suffice to detect drift.
upvoted 2 times
...

Topic 1 Question 175

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 175 discussion

You have recently trained a scikit-learn model that you plan to deploy on Vertex AI. This model will support both online and batch prediction. You need to preprocess input data for model inference. You want to package the model for deployment while minimizing additional code. What should you do?

  • A. 1. Upload your model to the Vertex AI Model Registry by using a prebuilt scikit-ieam prediction container.
    2. Deploy your model to Vertex AI Endpoints, and create a Vertex AI batch prediction job that uses the instanceConfig.instanceType setting to transform your input data.
  • B. 1. Wrap your model in a custom prediction routine (CPR). and build a container image from the CPR local model.
    2. Upload your scikit learn model container to Vertex AI Model Registry.
    3. Deploy your model to Vertex AI Endpoints, and create a Vertex AI batch prediction job
  • C. 1. Create a custom container for your scikit learn model.
    2. Define a custom serving function for your model.
    3. Upload your model and custom container to Vertex AI Model Registry.
    4. Deploy your model to Vertex AI Endpoints, and create a Vertex AI batch prediction job.
  • D. 1. Create a custom container for your scikit learn model.
    2. Upload your model and custom container to Vertex AI Model Registry.
    3. Deploy your model to Vertex AI Endpoints, and create a Vertex AI batch prediction job that uses the instanceConfig.instanceType setting to transform your input data.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
b1a8fae
Highly Voted 1 year, 4 months ago
Selected Answer: B
I go with B: “Custom prediction routines (CPR) lets you build custom containers with pre/post processing code easily, without dealing with the details of setting up an HTTP server or building a container from scratch.” (https://cloud.google.com/vertex-ai/docs/predictions/custom-prediction-routines). This alone makes B preferable to C and D, provided lack of complex model architecture. Regarding A, pre-built containers only allow serving predictions, but not preprocessing of data (https://cloud.google.com/vertex-ai/docs/predictions/pre-built-containers#use_a_prebuilt_container). B thus remains the most likely option.
upvoted 6 times
...
shadz10
Highly Voted 1 year, 3 months ago
Selected Answer: B
B - Creating a custom container without CPR adds additional complexity. i.e. write model server write dockerfile and also build and upload image. Where as using a CPR only requires writing a predictor and using vertex SDK to build image. https://cloud.google.com/vertex-ai/docs/predictions/custom-prediction-routines
upvoted 5 times
...
OpenKnowledge
Most Recent 1 month, 3 weeks ago
Selected Answer: B
Custom prediction routines (CPR) offer an easier way to add preprocessing and postprocessing logic to your models in Vertex AI, as they handle the underlying infrastructure like HTTP servers, while pre-built containers are simpler for standard inference but lack this flexibility, requiring a full custom container for complex custom code.
upvoted 1 times
...
desertlotus1211
8 months, 1 week ago
Selected Answer: A
you want to minimize code... all other you need code...
upvoted 1 times
...
bobjr
11 months, 1 week ago
Selected Answer: B
https://cloud.google.com/vertex-ai/docs/predictions/custom-prediction-routines
upvoted 1 times
...
gscharly
1 year ago
Selected Answer: B
agree with shadz10
upvoted 1 times
...
guilhermebutzke
1 year, 3 months ago
Selected Answer: C
My choose: C Option C ensures that the scikit-learn model is properly packaged, deployed, and integrated with Vertex AI services while minimizing the need for additional code beyond what is necessary for customizing the serving function. Option B is not considered correct because it suggests wrapping the scikit-learn model in a custom prediction routine (CPR), which might not be the most suitable approach for deploying scikit-learn models on Vertex AI. Options A and D using InstanceConfig, that is limited for preprocessing. Uploading the container without a serving function won't work.
upvoted 1 times
...
pikachu007
1 year, 4 months ago
Selected Answer: D
Considering the goal of minimizing additional code and complexity, option D - "Create a custom container for your scikit-learn model, upload your model and custom container to Vertex AI Model Registry, deploy your model to Vertex AI Endpoints, and create a Vertex AI batch prediction job that uses the instanceConfig.instanceType setting to transform your input data" seems to be a more straightforward and efficient approach. It involves customizing the container for the scikit-learn model, leveraging the Vertex AI Model Registry, and utilizing the specified instance type for batch prediction without introducing unnecessary complexity like custom prediction routines.
upvoted 1 times
...

Topic 1 Question 176

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 176 discussion

You work for a food product company. Your company’s historical sales data is stored in BigQuery.You need to use Vertex AI’s custom training service to train multiple TensorFlow models that read the data from BigQuery and predict future sales. You plan to implement a data preprocessing algorithm that performs mm-max scaling and bucketing on a large number of features before you start experimenting with the models. You want to minimize preprocessing time, cost, and development effort. How should you configure this workflow?

  • A. Write the transformations into Spark that uses the spark-bigquery-connector, and use Dataproc to preprocess the data.
  • B. Write SQL queries to transform the data in-place in BigQuery.
  • C. Add the transformations as a preprocessing layer in the TensorFlow models.
  • D. Create a Dataflow pipeline that uses the BigQuerylO connector to ingest the data, process it, and write it back to BigQuery.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
cert_pz
1 year ago
Selected Answer: C
Since it is already given that we will be using a TF-Model and do experiments exclusevly there, I don't see why we wouldn't use TF-Layers to preprocess the data. We would minimize costs by not having to store additional data. Time would be around the same as the layer transforms the attribute during training time and development would also be simpler, since if you are using keras it would literally be 2 more lines of code. However I see the Argument for B as well but I would still go with C in this case. Specifically in this case I would use Normalization layer for normalization and Discretization layer for binning/bucketing.
upvoted 1 times
...
fitri001
1 year, 6 months ago
Selected Answer: B
In-place Transformation: BigQuery allows you to perform data transformations directly within the data warehouse using SQL queries. This eliminates the need for data movement and reduces processing time compared to other options that involve data transfer. Minimized Development Effort: Since you're already familiar with SQL, writing queries for mm-max scaling and bucketing requires minimal additional development effort compared to learning and implementing new frameworks like Spark or Dataflow. Cost-Effective: BigQuery's serverless architecture scales processing power based on your workload. This can be more cost-effective than managing separate processing clusters like Dataproc.
upvoted 4 times
...
shadz10
1 year, 10 months ago
Selected Answer: B
B - Keeps the preprocessing algorithm seperate from the model
upvoted 2 times
...
36bdc1e
1 year, 10 months ago
C This option allows you to leverage the power and simplicity of TensorFlow to preprocess and transform the data with simple Python code
upvoted 2 times
...
BlehMaks
1 year, 10 months ago
Selected Answer: B
BigQuery can do both transformations https://cloud.google.com/bigquery/docs/manual-preprocessing#numerical_functions
upvoted 1 times
...
b1a8fae
1 year, 10 months ago
Selected Answer: B
BigQuery (SQL) is the easiest, cheapest approach
upvoted 1 times
...

Topic 1 Question 177

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 177 discussion

You have created a Vertex AI pipeline that includes two steps. The first step preprocesses 10 TB data completes in about 1 hour, and saves the result in a Cloud Storage bucket. The second step uses the processed data to train a model. You need to update the model’s code to allow you to test different algorithms. You want to reduce pipeline execution time and cost while also minimizing pipeline changes. What should you do?

  • A. Add a pipeline parameter and an additional pipeline step. Depending on the parameter value, the pipeline step conducts or skips data preprocessing, and starts model training.
  • B. Create another pipeline without the preprocessing step, and hardcode the preprocessed Cloud Storage file location for model training.
  • C. Configure a machine with more CPU and RAM from the compute-optimized machine family for the data preprocessing step.
  • D. Enable caching for the pipeline job, and disable caching for the model training step.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
lunalongo
11 months, 1 week ago
Selected Answer: B
B) The preprocessing step is already complete and its output is stored is in GCS, so a separate, smaller pipeline just for training is the most efficient solution. *A) Conditional logic still performs prepocessing steps when the logic points to not skipping it, increasing costs; C) While reducing preprocessing time, this solution would increase this step's cost; D) Would still include unnecessary preprocessing for each algorithm test before it's cached.
upvoted 2 times
...
fitri001
1 year, 6 months ago
Selected Answer: D
Caching Preprocessed Data: Since the preprocessed data (10 TB) is the same for different model training runs, enabling caching allows Vertex AI to reuse it for subsequent pipeline executions. This significantly reduces execution time and cost, especially for large datasets. Disabling Model Training Cache: Model training is typically non-deterministic due to factors like random initialization. Caching the model training step could lead to stale models and inaccurate results. Disabling caching ensures the model is re-trained each time with potentially updated code for different algorithms.
upvoted 2 times
...
gscharly
1 year, 6 months ago
Selected Answer: D
agree with guilhermebutzke
upvoted 1 times
...
guilhermebutzke
1 year, 9 months ago
Selected Answer: D
According to this documentation cited: https://cloud.google.com/vertex-ai/docs/pipelines/configure-caching it is possible to write a pipeline setting True or False for each task component, like this: # Model training step with caching disabled train_model_task = train_model_op() train_model_task.set_caching_options(False) # Disable caching for this step # Model training step depends on the preprocessing step train_model_task.after(preprocess_task) So, with this, letter D is the best option. Furthermore, letter A and, Adding a pipeline parameter and an additional pipeline step introduces unnecessary complexity when caching can handle conditional execution efficiently and in letter C, configuring a machine with more CPU and RAM for preprocessing does not address the goal of minimizing pipeline changes and reducing execution time/cost effectively.
upvoted 4 times
...
b1a8fae
1 year, 10 months ago
Selected Answer: D
Not A. Adding a pipeline parameter and new pipeline steps does not minimise pipeline changes. Not C. The idea is not to re-run the preprocessing step at all. Not B. Creating a whole new pipeline implies a significant investment of effort. I opt for D: Enabling caching only for preprocessing job (although it says “pipeline job” in the option, I think that is a typo). Quoting Vertex AI docs: “If there is a matching execution in Vertex ML Metadata, the outputs of that execution are used and the step is skipped. This helps to reduce costs by skipping computations that were completed in a previous pipeline run.” https://cloud.google.com/vertex-ai/docs/pipelines/configure-caching
upvoted 4 times
...
pikachu007
1 year, 10 months ago
Selected Answer: A
The pipeline already generates the preprocessed dataset and stores, there's no need to preprocess again for another model
upvoted 1 times
pikachu007
1 year, 10 months ago
rereading the question, I agree with b1a8fae that its D
upvoted 1 times
...
...

Topic 1 Question 178

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 178 discussion

You work for a bank. You have created a custom model to predict whether a loan application should be flagged for human review. The input features are stored in a BigQuery table. The model is performing well, and you plan to deploy it to production. Due to compliance requirements the model must provide explanations for each prediction. You want to add this functionality to your model code with minimal effort and provide explanations that are as accurate as possible. What should you do?

  • A. Create an AutoML tabular model by using the BigQuery data with integrated Vertex Explainable AI.
  • B. Create a BigQuery ML deep neural network model and use the ML.EXPLAIN_PREDICT method with the num_integral_steps parameter.
  • C. Upload the custom model to Vertex AI Model Registry and configure feature-based attribution by using sampled Shapley with input baselines.
  • D. Update the custom serving container to include sampled Shapley-based explanations in the prediction outputs.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
fitri001
1 year ago
Selected Answer: C
Existing Custom Model: This approach leverages your already-developed, well-performing model. There's no need to rebuild it using AutoML or BigQuery ML, which might require significant code changes. Vertex Explainable AI (XAI): Vertex AI offers XAI integration with custom models through feature-based attribution methods like sampled Shapley. This provides explanations for each prediction without requiring major modifications to your model code. Sampled Shapley with Baselines: Sampled Shapley is a robust attribution method for explaining model predictions. Using input baselines (like zero values) helps improve the interpretability of explanations, especially for features with large ranges.
upvoted 3 times
...
guilhermebutzke
1 year, 3 months ago
Selected Answer: C
According to the documentation at https://cloud.google.com/vertex-ai/docs/explainable-ai/overview, we can utilize both feature-based attribution and sampled Shapley-based explanations. Therefore, for providing explanations for each prediction in a loan classification problem, I believe that feature-based attribution is the optimal approach. Furthermore, updating the custom serving container to include sampled Shapley-based explanations, as suggested in option D, might require more effort, considering that the custom model deployed on Vertex AI already provides this option for explanations.
upvoted 4 times
...
sonicclasps
1 year, 3 months ago
Selected Answer: C
"minimal effort and provide explanations that are as accurate as possible" this makes the answer C, based on this: https://cloud.google.com/vertex-ai/docs/explainable-ai/improving-explanations
upvoted 3 times
...
daidai75
1 year, 3 months ago
Selected Answer: C
Feature attribution is supported for all types of models (both AutoML and custom-trained), frameworks (TensorFlow, scikit, XGBoost), BigQuery ML models, and modalities (images, text, tabular, video). https://cloud.google.com/vertex-ai/docs/explainable-ai/overview
upvoted 3 times
...
36bdc1e
1 year, 4 months ago
C you find the answer here https://cloud.google.com/vertex-ai/docs/explainable-ai/overview
upvoted 2 times
...
b1a8fae
1 year, 4 months ago
Selected Answer: D
pikachu007 answer made me reconsider
upvoted 1 times
daidai75
1 year, 3 months ago
https://cloud.google.com/vertex-ai/docs/explainable-ai/overview. According to this web link, Feature attribution is supported for all types of models (both AutoML and custom-trained), frameworks (TensorFlow, scikit, XGBoost), BigQuery ML models, and modalities (images, text, tabular, video).
upvoted 1 times
...
...
b1a8fae
1 year, 4 months ago
Selected Answer: A
Not a deep neural network for sure (B). Out of the remaining 3, A is the simplest approach.
upvoted 1 times
...
pikachu007
1 year, 4 months ago
Selected Answer: D
A and B is out because you already have a model, C does not provide an explanation for each prediction. Therefore D meets all the criteria.
upvoted 2 times
BlehMaks
1 year, 3 months ago
Why does not C provide an explanation for each prediction? As for me both C and D options provide an explanation for each prediction, the difference is only in the amount of effort required to configure explanations
upvoted 1 times
...
...

Topic 1 Question 179

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 179 discussion

You recently used XGBoost to train a model in Python that will be used for online serving. Your model prediction service will be called by a backend service implemented in Golang running on a Google Kubernetes Engine (GKE) cluster. Your model requires pre and postprocessing steps. You need to implement the processing steps so that they run at serving time. You want to minimize code changes and infrastructure maintenance, and deploy your model into production as quickly as possible. What should you do?

  • A. Use FastAPI to implement an HTTP server. Create a Docker image that runs your HTTP server, and deploy it on your organization’s GKE cluster.
  • B. Use FastAPI to implement an HTTP server. Create a Docker image that runs your HTTP server, Upload the image to Vertex AI Model Registry and deploy it to a Vertex AI endpoint.
  • C. Use the Predictor interface to implement a custom prediction routine. Build the custom container, upload the container to Vertex AI Model Registry and deploy it to a Vertex AI endpoint.
  • D. Use the XGBoost prebuilt serving container when importing the trained model into Vertex AI. Deploy the model to a Vertex AI endpoint. Work with the backend engineers to implement the pre- and postprocessing steps in the Golang backend service.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
ddogg
Highly Voted 1 year, 9 months ago
Selected Answer: C
Use the Predictor interface to implement a custom prediction routine. This allows you to include the preprocessing and postprocessing steps in the same deployment package as your model. Build the custom container, which packages your model and the associated preprocessing and postprocessing code together, simplifying deployment. Upload the container to Vertex AI Model Registry. This makes your model available for deployment on Vertex AI. Deploy it to a Vertex AI endpoint. This allows your model to be used for online serving. https://blog.thecloudside.com/custom-predict-routines-in-vertex-ai-46a7473c95db
upvoted 7 times
...
OpenKnowledge
Most Recent 1 month, 3 weeks ago
Selected Answer: C
custom prediction routine is a Python function for adding pre- and post-processing logic to model serving, whereas a pre-built container is a Docker image provided by a platform like Google Cloud's Vertex AI that includes a trained model and serves predictions with specific frameworks. You choose a custom routine for flexible, user-defined data transformations before and after prediction, and a pre-built container for simplified deployment when your model uses a supported framework without complex preprocessing needs.
upvoted 1 times
...
Prakzz
1 year, 4 months ago
Selected Answer: B
This approach minimizes code changes and infrastructure maintenance by leveraging Vertex AI's managed services for deployment. Implementing the preprocessing and postprocessing steps in a FastAPI server within a Docker container allows you to handle these steps at serving time efficiently. Deploying this Docker image to a Vertex AI endpoint simplifies the deployment process and reduces the burden of managing the infrastructure.
upvoted 3 times
...
AzureDP900
1 year, 4 months ago
Option C is a good choice if You have specific requirements for preprocessing or postprocessing that can't be met by the prebuilt XGBoost serving container. You need more control over the deployment process or want to integrate with other services. You're comfortable building and managing custom containers. However, if you just want a simple, straightforward way to deploy your model as a RESTful API, Option D (using the XGBoost prebuilt serving container) might be a better fit!
upvoted 1 times
...
livewalk
1 year, 5 months ago
Selected Answer: B
FastAPI allows to create a lightweight HTTP server with minimal code.
upvoted 1 times
...
guilhermebutzke
1 year, 9 months ago
Selected Answer: C
My answer C: Considering pre- and postprocessing implementation, The option C directly deals with implementing the processing steps in a custom container, offering full control over their placement and execution. This documentation says: “Custom prediction routines (CPR) lets you build [custom containers](https://cloud.google.com/vertex-ai/docs/predictions/use-custom-container) with pre/post processing code easily, without dealing with the details of setting up an HTTP server or building a container from scratch.” https://cloud.google.com/vertex-ai/docs/predictions/custom-prediction-routines So, it is better to use C instead of A or B. D is better because it offers the option of pre and post-processing, which is not available in D due to its use of prebuilt serving.
upvoted 2 times
...
36bdc1e
1 year, 10 months ago
C . Build the custom container, upload the container to Vertex AI Model Registry, and deploy it to a Vertex AI endpoint. This option allows you to leverage the power and simplicity of Vertex AI to serve your XGBoost model with minimal effort and customization. Vertex AI is a unified platform for building and deploying machine learning solutions on Google Cloud. Vertex AI can deploy a trained XGBoost model to an online prediction endpoint, which can provide low-latency predictions for individual instances. A custom prediction routine (CPR) is a Python script that defines the logic for preprocessing the input data, running the prediction, and postprocessing the output data.
upvoted 4 times
...
pikachu007
1 year, 10 months ago
Selected Answer: D
Considering the goal of minimizing code changes, infrastructure maintenance, and quickly deploying the model into production, option D seems to be a pragmatic approach. It leverages the prebuilt XGBoost serving container in Vertex AI, providing a managed environment for serving. The pre- and postprocessing steps can be implemented in the Golang backend service, maintaining consistency with the existing Golang implementation and reducing the need for significant code changes.
upvoted 2 times
...
vale_76_na_xxx
1 year, 10 months ago
I would say D
upvoted 1 times
...

Topic 1 Question 180

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 180 discussion

You recently deployed a pipeline in Vertex AI Pipelines that trains and pushes a model to a Vertex AI endpoint to serve real-time traffic. You need to continue experimenting and iterating on your pipeline to improve model performance. You plan to use Cloud Build for CI/CD You want to quickly and easily deploy new pipelines into production, and you want to minimize the chance that the new pipeline implementations will break in production. What should you do?

  • A. Set up a CI/CD pipeline that builds and tests your source code. If the tests are successful, use the Google. Cloud console to upload the built container to Artifact Registry and upload the compiled pipeline to Vertex AI Pipelines.
  • B. Set up a CI/CD pipeline that builds your source code and then deploys built artifacts into a pre-production environment. Run unit tests in the pre-production environment. If the tests are successful deploy the pipeline to production.
  • C. Set up a CI/CD pipeline that builds and tests your source code and then deploys built artifacts into a pre-production environment. After a successful pipeline run in the pre-production environment, deploy the pipeline to production.
  • D. Set up a CI/CD pipeline that builds and tests your source code and then deploys built artifacts into a pre-production environment. After a successful pipeline run in the pre-production environment, rebuild the source code and deploy the artifacts to production.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
fitri001
1 year ago
Selected Answer: C
CI/CD Pipeline: This automates the build, test, and deployment process, enabling faster iterations and reducing manual errors. Pre-production Environment: Deploying to a pre-production environment (staging) allows you to test the new pipeline functionality with simulated real-world data. This helps identify and fix potential issues before impacting production. Successful Pipeline Run: Verifying a successful run in the pre-production environment provides confidence that the new pipeline functions as expected.
upvoted 3 times
...
pinimichele01
1 year, 1 month ago
Selected Answer: C
Unit test is insufficient, there should be a pipeline run.
upvoted 1 times
...
ddogg
1 year, 3 months ago
Selected Answer: C
C. Pre-production environment: Deploying to a pre-production environment before production allows you to thoroughly test the new pipeline's functionality and performance without affecting real-time traffic. Successful pipeline run: This ensures the entire pipeline executes correctly in the pre-production environment, including training, model pushing, and endpoint deployment. No rebuild in production: Rebuilding the source code after a successful pre-production run is unnecessary and adds an extra step that could potentially introduce new errors.
upvoted 4 times
...
36bdc1e
1 year, 4 months ago
C The best option for continuing experimenting and iterating on your pipeline to improve model performance, using Cloud Build for CI/CD, and deploying new pipelines into production quickly and easily, is to set up a CI/CD pipeline that builds and tests your source code and then deploys built artifacts into a pre-production environment. After a successful pipeline run in the pre-production environment, deploy the pipeline to production. This option allows you to leverage the power and simplicity of Cloud Build to automate, monitor, and manage your pipeline development and deployment workflow.
upvoted 1 times
...
pikachu007
1 year, 4 months ago
Selected Answer: C
C. Set up a CI/CD pipeline that builds and tests your source code and then deploys built artifacts into a pre-production environment. After a successful pipeline run in the pre-production environment, deploy the pipeline to production. A - Does not have pre-production environment. B - Unit test is insufficient, there should be a pipeline run. D - (Uncertain) but there's shouldn't be a rebuilding as you have already built and tested successfully, feels redundant to rebuild.
upvoted 3 times
...

Topic 1 Question 181

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 181 discussion

You work for a bank with strict data governance requirements. You recently implemented a custom model to detect fraudulent transactions. You want your training code to download internal data by using an API endpoint hosted in your project’s network. You need the data to be accessed in the most secure way, while mitigating the risk of data exfiltration. What should you do?

  • A. Enable VPC Service Controls for peerings, and add Vertex AI to a service perimeter.
  • B. Create a Cloud Run endpoint as a proxy to the data. Use Identity and Access Management (IAM) authentication to secure access to the endpoint from the training job.
  • C. Configure VPC Peering with Vertex AI, and specify the network of the training job.
  • D. Download the data to a Cloud Storage bucket before calling the training job.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
OpenKnowledge
1 month, 3 weeks ago
Selected Answer: A
VPC Service Controls are a perimeter-based security feature that prevents data exfiltration by establishing a hard boundary around your Google Cloud resources. VPC peering is a network connectivity feature that allows you to establish private network connections between two separate VPC networks. While VPC peering enables communication, VPC Service Controls restrict it, offering different solutions to different problems: peering for connectivity, and service controls for data loss prevention.
upvoted 1 times
...
lunalongo
11 months, 1 week ago
Selected Answer: A
A is the right answer because it provides the strongest security posture, which the question statement emphasizes.VPC Service Controls offer a more robust defense against data exfiltration Why the other options are wrong: *B) If the proxy is compromised, data is exposed. *C) Peering establishes network connectivity; lacks inherent data access control *D) Downloading to Cloud Storage introduces data at rest vulnerability
upvoted 2 times
...
tardigradum
1 year, 3 months ago
Selected Answer: A
VPC Service Controls: This feature allows you to define network boundaries (service perimeters) and control the flow of data between services. By adding Vertex AI to a service perimeter, you can restrict its access to only the necessary resources, including the API endpoint.   With peerings you can enable secure communication between your VPC and the VPC where Vertex AI is running, ensuring data stays within your network boundary.
upvoted 3 times
...
dija123
1 year, 4 months ago
Selected Answer: A
A is correct
upvoted 1 times
...
peppenapo7
1 year, 6 months ago
Selected Answer: A
It's literally written in the description of this service: avoid data exfiltration.
upvoted 4 times
...
fitri001
1 year, 6 months ago
Selected Answer: B
Security: Cloud Run offers a secure environment to run your proxy code. IAM authentication ensures only authorized training jobs have access to the data endpoint. Data Minimization: The proxy can potentially filter or transform data before sending it to the training code, reducing the amount of sensitive information exposed. Network Isolation: The proxy acts as an additional layer of isolation between the training code and the internal data source.
upvoted 2 times
fitri001
1 year, 6 months ago
A. VPC Service Controls: While VPC Service Controls offer network segmentation, they wouldn't directly address data exfiltration risk from the training code itself. C. VPC Peering: VPC Peering allows communication between networks but doesn't provide access control mechanisms like IAM. D. Downloading to Cloud Storage: This approach creates an unnecessary data transfer step and doesn't address the risk of the training code potentially leaking data after download.
upvoted 1 times
pinimichele01
1 year, 6 months ago
https://cloud.google.com/vpc-service-controls/docs/overview#how-vpc-service-controls-works
upvoted 1 times
...
...
...
pinimichele01
1 year, 7 months ago
Selected Answer: A
To mitigate data exfiltration risks, your organization might also want to ensure secure data exchange across organizational boundaries with fine-grained controls. As an administrator, you might want to ensure the following: Clients with privileged access don't also have access to partner resources. Clients with access to sensitive data can only read public data sets but not write to them
upvoted 1 times
...
Sunny_M
1 year, 8 months ago
It should be A, VPC service controls can reduce data exfiltration risks. https://cloud.google.com/vpc-service-controls/docs/overview
upvoted 2 times
...
guilhermebutzke
1 year, 8 months ago
Selected Answer: B
My Answer B: Creating a Cloud Run endpoint as a proxy to the data allows you to control access to the internal data through an API endpoint. By using IAM authentication, you can enforce strict access controls, ensuring that only authorized entities (such as your training job) can access the data. This approach helps mitigate the risk of data exfiltration by providing a secure and controlled access point to the internal data. - Option A: may help control access within Google Cloud Platform services, but it does not directly address securing access to the internal data through an API endpoint. - Option C: is more about network configurations and does not provide a solution for securely accessing the internal data through an API endpoint. - Option D: transferring the data to a Cloud Storage bucket, which might introduce additional security risks during the data transfer process.
upvoted 3 times
...
guilhermebutzke
1 year, 8 months ago
My Answer B: Creating a Cloud Run endpoint as a proxy to the data allows you to control access to the internal data through an API endpoint. By using Identity and Access Management (IAM) authentication, you can enforce strict access controls, ensuring that only authorized entities (such as your training job) can access the data. This approach helps mitigate the risk of data exfiltration by providing a secure and controlled access point to the internal data. - Option A: may help control access within Google Cloud Platform services, but it does not directly address securing access to the internal data through an API endpoint. - Option C: is more about network configurations and does not provide a solution for securely accessing the internal data through an API endpoint. - Option D: involves transferring the data to a Cloud Storage bucket, which might introduce additional security risks during the data transfer process.
upvoted 3 times
...
ddogg
1 year, 9 months ago
Selected Answer: A
A. https://cloud.google.com/security/vpc-service-controls?hl=en The first benefit on the official google cloud site is "Mitigate data exfiltration risks" Here's why: VPC Service Controls: This powerful tool allows you to restrict the network connectivity of resources within your VPC network. By enabling it for peerings, you can control which services within your project can access specific internal resources. Service perimeter: Adding Vertex AI to a service perimeter further restricts its access to only approved internal resources, including the API endpoint for your bank's data. This creates a secure zone where your model training can happen without jeopardizing sensitive data.
upvoted 1 times
...
daidai75
1 year, 9 months ago
Selected Answer: A
I will go with A.
upvoted 1 times
...
pikachu007
1 year, 10 months ago
Selected Answer: B
It provides a controlled and secure way to allow the training job to access the necessary data while adhering to strict data governance requirements.
upvoted 1 times
...

Topic 1 Question 182

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 182 discussion

You are deploying a new version of a model to a production Vertex Al endpoint that is serving traffic. You plan to direct all user traffic to the new model. You need to deploy the model with minimal disruption to your application. What should you do?

  • A. 1. Create a new endpoint
    2. Create a new model. Set it as the default version. Upload the model to Vertex AI Model Registry
    3. Deploy the new model to the new endpoint
    4. Update Cloud DNS to point to the new endpoint
  • B. 1. Create a new endpoint
    2. Create a new model. Set the parentModel parameter to the model ID of the currently deployed model and set it as the default version. Upload the model to Vertex AI Model Registry
    3. Deploy the new model to the new endpoint, and set the new model to 100% of the traffic.
  • C. 1. Create a new model. Set the parentModel parameter to the model ID of the currently deployed model. Upload the model to Vertex AI Model Registry.
    2. Deploy the new model to the existing endpoint, and set the new model to 100% of the traffic
  • D. 1. Create a new model. Set it as the default version. Upload the model to Vertex AI Model Registry
    2. Deploy the new model to the existing endpoint
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
fitri001
1 year ago
Selected Answer: C
Minimal Downtime: By deploying the new model to the existing endpoint, you avoid any service interruptions caused by creating and switching to a completely new endpoint. Versioning: Setting the parentModel parameter allows you to track the lineage of your models and easily revert to the previous version if needed. Traffic Control: Vertex AI lets you control traffic allocation between different versions of a model deployed on the same endpoint. Setting the new model to 100% traffic directs all user requests to the new version.
upvoted 4 times
fitri001
1 year ago
A. Creating a New Endpoint: This approach introduces downtime as you need to switch DNS records to point to the new endpoint. B. Creating a New Endpoint with Default Version: While using a parentModel helps with versioning, creating a new endpoint still leads to service disruption. D. Deploying to Existing Endpoint Without Traffic Control: This might cause unexpected behavior if the new model isn't ready for production traffic.
upvoted 2 times
...
...
guilhermebutzke
1 year, 2 months ago
Selected Answer: C
My Answer: C In the context of deploying machine learning models, setting the **`parentModel`** parameter to the model ID of the currently deployed model means that the new model being deployed is created as a child model or an iteration of the existing model. This allows the new model to inherit certain properties or characteristics from the existing model, such as the architecture, hyperparameters, or feature transformations. Create a new Endpoint is Unnecessary.
upvoted 2 times
...
sonicclasps
1 year, 3 months ago
Selected Answer: D
Optionally set this model as the default version. The default version is preselected whenever the model is used for prediction (although you can still select other versions). https://cloud.google.com/vertex-ai/docs/model-registry/versioning
upvoted 1 times
...
BlehMaks
1 year, 4 months ago
Selected Answer: C
a,c -creating new endpoint is an unnecessary disruption to the application d - doesn't work, two models are on the same endpoint and traffic is still going through the old model
upvoted 1 times
...
pikachu007
1 year, 4 months ago
Selected Answer: C
Leverages existing endpoint: Using the same endpoint maintains the same endpoint URL, avoiding DNS updates and potential service interruptions. Gradual traffic transition: Vertex AI allows you to gradually shift traffic between model versions, ensuring a smooth transition without impacting users. Clear versioning: Setting parentModel establishes a relationship between the new model and the existing one, aiding in organization and tracking model lineage.
upvoted 3 times
...

Topic 1 Question 183

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 183 discussion

You are training an ML model on a large dataset. You are using a TPU to accelerate the training process. You notice that the training process is taking longer than expected. You discover that the TPU is not reaching its full capacity. What should you do?

  • A. Increase the learning rate
  • B. Increase the number of epochs
  • C. Decrease the learning rate
  • D. Increase the batch size
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
fitri001
Highly Voted 1 year ago
Selected Answer: D
A common reason for underutilized TPUs is a small batch size. TPUs are designed for high throughput, and feeding them small batches doesn't leverage their full potential. Try increasing the batch size while monitoring model performance. A larger batch size can lead to faster training but might also affect accuracy. Experiment to find the optimal balance.
upvoted 5 times
...
36bdc1e
Most Recent 1 year, 4 months ago
D taking big batch size allows to use more memory and decrease the train time
upvoted 1 times
...
BlehMaks
1 year, 4 months ago
Selected Answer: D
Batch size is too small because of sharding https://cloud.google.com/tpu/docs/performance-guide
upvoted 1 times
...
pikachu007
1 year, 4 months ago
Selected Answer: D
D, the bigger the batch size, the more resource is taken up
upvoted 2 times
...

Topic 1 Question 184

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 184 discussion

You work for a retail company. You have a managed tabular dataset in Vertex AI that contains sales data from three different stores. The dataset includes several features, such as store name and sale timestamp. You want to use the data to train a model that makes sales predictions for a new store that will open soon. You need to split the data between the training, validation, and test sets. What approach should you use to split the data?

  • A. Use Vertex AI manual split, using the store name feature to assign one store for each set
  • B. Use Vertex AI default data split
  • C. Use Vertex AI chronological split, and specify the sales timestamp feature as the time variable
  • D. Use Vertex AI random split, assigning 70% of the rows to the training set, 10% to the validation set, and 20% to the test set
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
fitri001
Highly Voted 1 year, 6 months ago
Selected Answer: C
Time-Series Data: Your sales data has timestamps, indicating it's time-series data. A chronological split considers the order of the timestamps, ensuring the model is trained on historical trends. Predicting for New Store: Since you want to predict sales for a new store, a chronological split is better than a random split (option D) which wouldn't prioritize recent trends. Vertex AI Functionality: Vertex AI's chronological split functionality is specifically designed for time-series data and leverages the timestamp feature you provide to separate data for training, validation, and testing.
upvoted 7 times
fitri001
1 year, 6 months ago
A. Manual Split by Store: While this might work, it doesn't consider the time element crucial for sales predictions. The new store's performance might not be well-represented by data from a single existing store. B. Default Split (Random): The default random split in Vertex AI might not prioritize recent data which could be more relevant for predicting sales in the new store. D. Random Split with Specific Ratios: Similar to the default split, a random approach might not capture the time-series aspect and recent trends that are important for your new store predictions.
upvoted 1 times
...
...
spradhan
Most Recent 3 months, 3 weeks ago
Selected Answer: A
C results in data leak.
upvoted 1 times
...
Omi_04040
11 months ago
Selected Answer: A
Since the question is to predict the for a new store and not sales prediction in general, the answer has to be 'A'
upvoted 1 times
...
guilhermebutzke
1 year, 8 months ago
Selected Answer: C
My answer C: A: Not Correct:  Splitting based on store name wouldn't guarantee temporal separation of data. Furthermore, for this problem is note to assign one store for each set, because the target is for a new store. B: Not Correct:  Randomly choosing data points across different time periods could lead to the model not capturing seasonal trends or temporal patterns effectively. C: CORRECT: it leverages the chronological nature of the data. Since the dataset contains sales data over time from different stores, using a chronological split ensures that the model is trained on data from earlier time periods and validated/tested on more recent data. D: Not Correct: Similar to B, a custom random split wouldn't ensure temporal separation and could lead to issues with capturing temporal trends.
upvoted 2 times
...
shadz10
1 year, 10 months ago
Selected Answer: C
I agree with b1a8fae
upvoted 1 times
...
BlehMaks
1 year, 10 months ago
Selected Answer: C
https://cloud.google.com/automl-tables/docs/data-best-practices#time
upvoted 1 times
...
b1a8fae
1 year, 10 months ago
Selected Answer: C
Anything different than option C could potentially lead to data leakage imo.
upvoted 1 times
...
pikachu007
1 year, 10 months ago
Selected Answer: A
By using a manual split based on store names, you can train a model that is more sensitive to the unique characteristics of each store, ultimately leading to better predictions for the new store.
upvoted 1 times
DaleR
11 months, 1 week ago
All the research and document supports this answer.
upvoted 1 times
...
...
vale_76_na_xxx
1 year, 10 months ago
I say C , time-based splitting is always suggest
upvoted 1 times
...

Topic 1 Question 185

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 185 discussion

You have developed a BigQuery ML model that predicts customer chum, and deployed the model to Vertex AI Endpoints. You want to automate the retraining of your model by using minimal additional code when model feature values change. You also want to minimize the number of times that your model is retrained to reduce training costs. What should you do?

  • A. 1 Enable request-response logging on Vertex AI Endpoints
    2. Schedule a TensorFlow Data Validation job to monitor prediction drift
    3. Execute model retraining if there is significant distance between the distributions
  • B. 1. Enable request-response logging on Vertex AI Endpoints
    2. Schedule a TensorFlow Data Validation job to monitor training/serving skew
    3. Execute model retraining if there is significant distance between the distributions
  • C. 1. Create a Vertex AI Model Monitoring job configured to monitor prediction drift
    2. Configure alert monitoring to publish a message to a Pub/Sub queue when a monitoring alert is detected
    3. Use a Cloud Function to monitor the Pub/Sub queue, and trigger retraining in BigQuery
  • D. 1. Create a Vertex AI Model Monitoring job configured to monitor training/serving skew
    2. Configure alert monitoring to publish a message to a Pub/Sub queue when a monitoring alert is detected
    3. Use a Cloud Function to monitor the Pub/Sub queue, and trigger retraining in BigQuery
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
guilhermebutzke
Highly Voted 1 year, 8 months ago
Selected Answer: D
My answer: D Given the emphasis on "model feature values change" in the question, the most suitable option would be D. Although option C involves monitoring prediction drift, which may indirectly capture changes in feature values, option D directly addresses the need to monitor training/serving skew. By detecting discrepancies between the training and serving data distributions, option D is more aligned with the requirement to automate retraining when model feature values change. Therefore, option D is the most suitable choice in this context.
upvoted 8 times
...
bobjr
Highly Voted 1 year, 5 months ago
Selected Answer: C
Skew should be detected at the beginning of the productionalisation of the model -> skew test the training data Vs the real data -> a skew indicates you trained in a dataset that is not alined with your data that you have in input Drift is used when the model works well at the beginning, but the world change and the data input changes -> drift is more long term here it is a drift issue
upvoted 5 times
rajshiv
11 months, 1 week ago
the issue is "drift" and not "Skew". Hence C is more correct as compared to D.
upvoted 2 times
...
Prakzz
1 year, 4 months ago
Agreed
upvoted 1 times
...
...
OpenKnowledge
Most Recent 1 month, 3 weeks ago
Selected Answer: C
This is data drift issue; not a skew issue. So the answer is C Skew and drift both describe changes in data, but they differ in their cause and timing. Skew refers to a sudden mismatch between your training data and production data, which can occur immediately upon deployment. Drift is a gradual change in the production data's statistical properties over time, causing model performance to degrade.
upvoted 1 times
...
Begum
6 months ago
Selected Answer: D
C -> Prediction drift (When the overall distribution of predictions changes significantly between training and serving data). "You want to automate when the feature value changes" D -> Training/serving skew (When the distribution of specific features between training and serving data differs significantly)
upvoted 1 times
...
bc3f222
8 months ago
Selected Answer: C
Training/serving skew monitoring is best used to detect mismatches between training and serving data schemas—not feature drift over time. Prediction drift is more relevant for this use case.
upvoted 1 times
...
f084277
12 months ago
Selected Answer: C
Skew is static, drift happens over time. Answer is C.
upvoted 2 times
...
Shno
1 year, 6 months ago
if the model training is done through bigquery ML, we don't have access to the training data after export, so I don't understand how training/serving skew can be applied. Can someone who is voting in favour of D clarify?
upvoted 1 times
...
gscharly
1 year, 6 months ago
Selected Answer: D
I go with D
upvoted 1 times
...
pinimichele01
1 year, 7 months ago
Selected Answer: D
It's D
upvoted 1 times
pinimichele01
1 year, 6 months ago
see guilhermebutzke
upvoted 1 times
...
...
CHARLIE2108
1 year, 8 months ago
Selected Answer: D
changed my mind it's D
upvoted 3 times
...
CHARLIE2108
1 year, 9 months ago
Selected Answer: C
I go with C but D is pretty similar. C -> Prediction drift (When the overall distribution of predictions changes significantly between training and serving data). D -> Training/serving skew (When the distribution of specific features between training and serving data differs significantly).
upvoted 3 times
CHARLIE2108
1 year, 8 months ago
It's D
upvoted 1 times
...
...
ddogg
1 year, 9 months ago
Selected Answer: C
Option C: This option directly addresses your requirements: Vertex AI Model Monitoring: It allows efficient monitoring of prediction drift through metrics like Mean Squared Error or AUC-ROC. Pub/Sub alerts: Alert triggers notification upon significant drift, minimizing unnecessary retraining. Cloud Function: It reacts to Pub/Sub messages and triggers retraining in BigQuery using minimal additional code.
upvoted 3 times
...
b1a8fae
1 year, 10 months ago
Selected Answer: C
After reconsidering, I think it is C: - No need to use TF to enable model monitoring as stated here: https://cloud.google.com/vertex-ai/docs/model-monitoring/using-model-monitoring (even if it uses it under the hood: https://cloud.google.com/vertex-ai/docs/model-monitoring/overview#calculating-skew-and-drift) - The problem speaks about alerting of model feature changes, which happens over time, and uses a baseline of the historical production data -> prediction skew. (if the problem specified that it changes compared to training data, then it would be training-skew) (https://cloud.google.com/vertex-ai/docs/model-monitoring/monitor-explainable-ai#feature_attribution_training-serving_skew_and_prediction_drift)
upvoted 4 times
...
b1a8fae
1 year, 10 months ago
Selected Answer: D
I would avoid using TensorFlow validation to minimize code written. That leaves us with options C and D. Now, since it is the values of the features that we want to flag and not the value of the predictions, this sounds more like training-serving skew situation than prediction drift. Hence, I would go for D.
upvoted 4 times
...
BlehMaks
1 year, 10 months ago
Selected Answer: D
i've changed my mind) it's D https://www.evidentlyai.com/blog/machine-learning-monitoring-data-and-concept-drift
upvoted 1 times
...
BlehMaks
1 year, 10 months ago
Selected Answer: D
we might need to retrain if the feature data distribution in the production and training are significantly different(training/serving skew). Prediction drift occurs when feature data distribution in production changes significantly over time. Should we retrain our model every time when we meet prediction drift? I dont think so, better to analyze why this drift happens. https://cloud.google.com/vertex-ai/docs/model-monitoring/overview#considerations
upvoted 1 times
...
36bdc1e
1 year, 10 months ago
C The best option for automating the retraining of your model by using minimal additional code when model feature values change, and minimizing the number of times that your model is retrained to reduce training costs, is to create a Vertex AI Model Monitoring job configured to monitor prediction drift, configure alert monitoring to publish a message to a Pub/Sub queue when a monitoring alert is detected, and use a Cloud Function to monitor the Pub/Sub queue, and trigger retraining in BigQuery. This option allows you to leverage the power and simplicity of Vertex AI, Pub/Sub, and Cloud Functions to monitor your model performance and retrain your model when needed. Vertex AI is a unified platform for building and deploying machine learning solutions on Google Cloud.
upvoted 2 times
...

Topic 1 Question 186

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 186 discussion

You have been tasked with deploying prototype code to production. The feature engineering code is in PySpark and runs on Dataproc Serverless. The model training is executed by using a Vertex AI custom training job. The two steps are not connected, and the model training must currently be run manually after the feature engineering step finishes. You need to create a scalable and maintainable production process that runs end-to-end and tracks the connections between steps. What should you do?

  • A. Create a Vertex AI Workbench notebook. Use the notebook to submit the Dataproc Serverless feature engineering job. Use the same notebook to submit the custom model training job. Run the notebook cells sequentially to tie the steps together end-to-end.
  • B. Create a Vertex AI Workbench notebook. Initiate an Apache Spark context in the notebook and run the PySpark feature engineering code. Use the same notebook to run the custom model training job in TensorFlow. Run the notebook cells sequentially to tie the steps together end-to-end.
  • C. Use the Kubeflow pipelines SDK to write code that specifies two components:
    - The first is a Dataproc Serverless component that launches the feature engineering job
    - The second is a custom component wrapped in the create_custom_training_job_from_component utility that launches the custom model training job
    Create a Vertex AI Pipelines job to link and run both components
  • D. Use the Kubeflow pipelines SDK to write code that specifies two components
    - The first component initiates an Apache Spark context that runs the PySpark feature engineering code
    - The second component runs the TensorFlow custom model training code
    Create a Vertex AI Pipelines job to link and run both components.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Akel123
11 months, 2 weeks ago
Selected Answer: C
The first is a Dataproc Serverless component that launches the feature engineering job The second is a custom component wrapped in the create_custom_training_job_from_component utility that launches the custom model training job Create a Vertex AI Pipelines job to link and run both components
upvoted 2 times
...
fitri001
1 year ago
Selected Answer: C
The first is a Dataproc Serverless component that launches the feature engineering job The second is a custom component wrapped in the create_custom_training_job_from_component utility that launches the custom model training job Create a Vertex AI Pipelines job to link and run both components
upvoted 2 times
fitri001
1 year ago
A. Vertex AI Workbench notebook: While notebooks are a good way to prototype workflows, they are not ideal for production due to limitations in scalability and version control. Running everything sequentially also doesn't allow for potential parallelization of tasks. B. Apache Spark context in notebook: Similar to A, notebooks are not ideal for production. Additionally, running the model training with TensorFlow within the notebook ties the process to a specific framework, making it less flexible. D. Kubeflow pipelines with Spark context: This option gets close, but it's unnecessary to initiate a Spark context within the first component. Dataproc Serverless already handles the Spark environment for running PySpark code.
upvoted 3 times
...
...
CHARLIE2108
1 year, 3 months ago
Selected Answer: C
I went with C
upvoted 1 times
...
kalle_balle
1 year, 4 months ago
Selected Answer: C
Vote C
upvoted 1 times
...
36bdc1e
1 year, 4 months ago
C The best option for creating a scalable and maintainable production process that runs end-to-end and tracks the connections between steps, using prototype code to production, feature engineering code in PySpark that runs on Dataproc Serverless, and model training that is executed by using a Vertex AI custom training job, is to use the Kubeflow pipelines SDK to write code that specifies two components. The first is a Dataproc Serverless component that launches the feature engineering job. The second is a custom component wrapped in the create_custom_training_job_from_component utility that launches the custom model training job. This option allows you to leverage the power and simplicity of Kubeflow pipelines to orchestrate and automate your machine learning workflows on Vertex AI. Kubeflow pipelines is a platform that can build, deploy, and manage machine learning pipelines on Kubernetes.
upvoted 1 times
...
pikachu007
1 year, 4 months ago
Selected Answer: C
By using Kubeflow Pipelines, you establish a structured, scalable, and maintainable production process for end-to-end model development and deployment, ensuring proper orchestration, tracking, and integration with the chosen services.
upvoted 3 times
...
vale_76_na_xxx
1 year, 4 months ago
I go for C
upvoted 1 times
...

Topic 1 Question 187

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 187 discussion

You recently deployed a scikit-learn model to a Vertex AI endpoint. You are now testing the model on live production traffic. While monitoring the endpoint, you discover twice as many requests per hour than expected throughout the day. You want the endpoint to efficiently scale when the demand increases in the future to prevent users from experiencing high latency. What should you do?

  • A. Deploy two models to the same endpoint, and distribute requests among them evenly
  • B. Configure an appropriate minReplicaCount value based on expected baseline traffic
  • C. Set the target utilization percentage in the autoscailngMetricSpecs configuration to a higher value
  • D. Change the model’s machine type to one that utilizes GPUs
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
fitri001
Highly Voted 1 year ago
Selected Answer: B
Autoscaling based on baseline: Vertex AI endpoints have built-in autoscaling capabilities. Setting a minReplicaCount ensures there are always at least that many replicas running, handling the baseline traffic efficiently. When demand increases above the baseline, autoscaling will automatically provision additional replicas to maintain performance. Efficient scaling: This approach allows the endpoint to scale up smoothly as traffic increases, preventing sudden spikes in latency for users. Targeted resource allocation: Unlike option A (deploying multiple models), this method avoids redundant resources when traffic is low. Additionally, option D (switching to GPUs) might be unnecessary if the bottleneck isn't processing power.
upvoted 7 times
fitri001
1 year ago
A. Deploying multiple models: This creates additional overhead and resource usage without directly addressing autoscaling. Traffic distribution may also not be perfectly even. C. Increasing target utilization: Raising the target utilization could lead to under-provisioning during peak hours, causing latency issues. It's better to set a baseline with minReplicaCount and let autoscaling handle peak loads. D. Switching to GPUs: While GPUs can be beneficial for computationally intensive models, it might be an unnecessary expense if the current model doesn't heavily utilize the CPU. Analyze the CPU usage before switching to a GPU-based machine type.
upvoted 2 times
...
...
guilhermebutzke
Most Recent 1 year, 3 months ago
Selected Answer: B
My Answer B The letter C would be the correct answer if the target were set lower to anticipate traffic spikes, not set higher as the answer says. However, considering that the minReplicaCount is now twice the known value, letter B is the most appropriate answer as it suggests considering setting a new minReplicaCount, which could be the best choice.
upvoted 2 times
...
Yan_X
1 year, 3 months ago
Selected Answer: B
Not C, if set to a higher value, it is less easier to autoscale to another instance, as it will wait the utilisation to a even higher value.
upvoted 4 times
...
b1a8fae
1 year, 3 months ago
Selected Answer: C
I go with C. It calculates the number of replicas based on CPU utilization. https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform_v1.types.AutoscalingMetricSpec
upvoted 1 times
...
36bdc1e
1 year, 4 months ago
B This option allows you to leverage the power and simplicity of Vertex AI to automatically scale your endpoint resources according to the traffic patterns.
upvoted 2 times
...
pikachu007
1 year, 4 months ago
Selected Answer: C
c as it is dynamic
upvoted 1 times
sonicclasps
1 year, 3 months ago
yes it's dynamic, but the target should be set lower, not higher, if you want to anticipate traffic spikes.
upvoted 3 times
...
...

Topic 1 Question 188

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 188 discussion

You work at a bank. You have a custom tabular ML model that was provided by the bank’s vendor. The training data is not available due to its sensitivity. The model is packaged as a Vertex AI Model serving container, which accepts a string as input for each prediction instance. In each string, the feature values are separated by commas. You want to deploy this model to production for online predictions and monitor the feature distribution over time with minimal effort. What should you do?

  • A. 1. Upload the model to Vertex AI Model Registry, and deploy the model to a Vertex AI endpoint
    2. Create a Vertex AI Model Monitoring job with feature drift detection as the monitoring objective, and provide an instance schema
  • B. 1. Upload the model to Vertex AI Model Registry, and deploy the model to a Vertex AI endpoint
    2. Create a Vertex AI Model Monitoring job with feature skew detection as the monitoring objective, and provide an instance schema
  • C. 1. Refactor the serving container to accept key-value pairs as input format
    2. Upload the model to Vertex AI Model Registry, and deploy the model to a Vertex AI endpoint
    3. Create a Vertex AI Model Monitoring job with feature drift detection as the monitoring objective.
  • D. 1. Refactor the serving container to accept key-value pairs as input format
    2. Upload the model to Vertex AI Model Registry, and deploy the model to a Vertex AI endpoint
    3. Create a Vertex AI Model Monitoring job with feature skew detection as the monitoring objective
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
b1a8fae
Highly Voted 1 year, 3 months ago
Selected Answer: A
A. Minimum effort -> ditch refactoring (hopefully not needed) Training data not available -> can't be skew, so it must be drift
upvoted 5 times
...
OpenKnowledge
Most Recent 1 month, 3 weeks ago
Selected Answer: A
This is a drift issue. Also, Vertex AI Model Monitoring can handle string data directly, including parsing comma-separated feature values, without requiring a refactoring of the string data into a key-value format for the serving container.
upvoted 1 times
...
pinimichele01
1 year, 1 month ago
Selected Answer: A
Training data not available -> can't be skew, so it must be drift
upvoted 3 times
...
CHARLIE2108
1 year, 3 months ago
I have a doubt, could someone please help with this? While "drift" (Option A) might imply gradual changes, "skew" (Option B) is more suitable for sudden shifts in feature distributions, potentially relevant for sensitive data. Is option B better than A?
upvoted 1 times
tavva_prudhvi
1 year ago
Feature skew is typically used to compare the feature distribution between training data and serving data, which is not as relevant here because you do not have access to the training data. Therefore, Option B is less suitable.
upvoted 4 times
...
...
pikachu007
1 year, 4 months ago
Selected Answer: A
Handles string input format: Vertex AI Model Monitoring can parse comma-separated feature values, avoiding the need to refactor the serving container. It directly monitors feature distribution over time, aligning with the goal of detecting potential drifts.
upvoted 1 times
...

Topic 1 Question 189

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 189 discussion

You are implementing a batch inference ML pipeline in Google Cloud. The model was developed using TensorFlow and is stored in SavedModel format in Cloud Storage. You need to apply the model to a historical dataset containing 10 TB of data that is stored in a BigQuery table. How should you perform the inference?

  • A. Export the historical data to Cloud Storage in Avro format. Configure a Vertex AI batch prediction job to generate predictions for the exported data
  • B. Import the TensorFlow model by using the CREATE MODEL statement in BigQuery ML. Apply the historical data to the TensorFlow model
  • C. Export the historical data to Cloud Storage in CSV format. Configure a Vertex AI batch prediction job to generate predictions for the exported data
  • D. Configure a Vertex AI batch prediction job to apply the model to the historical data in BigQuery
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
edoo
Highly Voted 1 year, 8 months ago
Selected Answer: B
The choice is between B and D, both good BUT: Importing and making batch predictions is quite straightforward in BQ ML https://cloud.google.com/bigquery/docs/making-predictions-with-imported-tensorflow-models if not pre-processing needed on the data. If we need a more complete pipeline I'd chose D, but the tables need partitioning (100GB is the limit in Vertex AI): https://cloud.google.com/vertex-ai/docs/tabular-data/classification-regression/get-batch-predictions#input_data_requirements
upvoted 5 times
...
OpenKnowledge
Most Recent 1 month, 3 weeks ago
Selected Answer: B
Importing a TensorFlow model into BigQuery ML offers significant benefits, primarily by eliminating the need for data movement and simplifying the process of running inference on large datasets. In this case, the model is brought to the data, rather than the data being moved to the model for predictions. This saves time and cost, and it avoids the security risks associated with exporting large datasets.
upvoted 1 times
...
NamitSehgal
8 months, 3 weeks ago
Selected Answer: D
Managed Service: Vertex AI batch prediction
upvoted 2 times
...
lunalongo
11 months, 1 week ago
Selected Answer: A
A) - BigQuery ML is not designed for the scale of a 10TB dataset - Batch Prediction performs efficient batch inference on large GCS datasets - AVRO is a binary format, more compact and efficient to process than CSV *B uses BQML; C uses CSV format; exporting to GCS is more efficient than performing Vertex AI predictions directly on BQ for this volumetry.
upvoted 1 times
...
rajshiv
11 months, 1 week ago
Selected Answer: A
It should be A. The "CREATE MODEL" statement in BigQuery ML is meant for BigQuery-specific models, and do not support models like TensorFlow SavedModel out of the box. This option would not work for using a TensorFlow model stored in Cloud Storage.
upvoted 1 times
Omi_04040
11 months ago
Not true at all https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create-tensorflow
upvoted 2 times
...
...
Foxy2021
1 year, 1 month ago
My answer is D.
upvoted 1 times
...
pinimichele01
1 year, 7 months ago
Selected Answer: B
https://cloud.google.com/vertex-ai/docs/tabular-data/classification-regression/get-batch-predictions#input_data_requirements
upvoted 1 times
...
guilhermebutzke
1 year, 8 months ago
Selected Answer: D
My Answer: D The historical dataset is stored in BigQuery, which can be directly accessed by Vertex AI. Vertex AI offers batch prediction capabilities, allowing you to apply the model to the data stored in BigQuery without the need to export it. So, This approach leverages the scalability of Google Cloud infrastructure and avoids unnecessary data movement, being not necessary to export data to Cloud Store (options A and C), nor Import the TensorFlow model to BQ (option B).
upvoted 2 times
...
ddogg
1 year, 9 months ago
Selected Answer: B
https://cloud.google.com/bigquery/docs/making-predictions-with-imported-tensorflow-models#:~:text=Import%20TensorFlow%20models,-To%20import%20TensorFlow&text=In%20the%20Google%20Cloud%20console%2C%20go%20to%20the%20BigQuery%20page.&text=In%20the%20query%20editor%2C%20enter,MODEL%20statement%20like%20the%20following.&text=The%20preceding%20query%20imports%20a,BigQuery%20ML%20model%20named%20imported_tf_model%20.
upvoted 2 times
...
sonicclasps
1 year, 9 months ago
Selected Answer: B
https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create-tensorflow#limitations
upvoted 2 times
...
Zwi3b3l
1 year, 9 months ago
Selected Answer: B
Has to be B, because D has limitations: BigQuery data source tables must be no larger than 100 GB. https://cloud.google.com/vertex-ai/docs/tabular-data/classification-regression/get-batch-predictions#input_data_requirements
upvoted 4 times
...
BlehMaks
1 year, 9 months ago
Selected Answer: A
Same platform as data == less computation required to load and pass it to model
upvoted 1 times
BlehMaks
1 year, 9 months ago
i mean B
upvoted 1 times
...
...
b1a8fae
1 year, 10 months ago
Selected Answer: D
It could either be B or D. It seems like most of the limitations of B are mentioned in the problem (https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create-tensorflow#limitations) but some of them are not and we are left questioning if the model will match the remaining requirements. Therefore, I would go for D, which can import data from BigQuery. https://cloud.google.com/vertex-ai/docs/predictions/get-batch-predictions#bigquery
upvoted 2 times
...
pikachu007
1 year, 10 months ago
Selected Answer: D
Limitations of other options: A and C. Exporting data: Exporting 10 TB of data to Cloud Storage incurs additional storage costs, transfer time, and potential data management complexities. B. BigQuery ML: While BigQuery ML supports some TensorFlow models, it might have limitations with certain model architectures or features. Additionally, it might not be as optimized for large-scale batch inference as Vertex AI.
upvoted 1 times
...

Topic 1 Question 190

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 190 discussion

You recently deployed a model to a Vertex AI endpoint. Your data drifts frequently, so you have enabled request-response logging and created a Vertex AI Model Monitoring job. You have observed that your model is receiving higher traffic than expected. You need to reduce the model monitoring cost while continuing to quickly detect drift. What should you do?

  • A. Replace the monitoring job with a DataFlow pipeline that uses TensorFlow Data Validation (TFDV)
  • B. Replace the monitoring job with a custom SQL script to calculate statistics on the features and predictions in BigQuery
  • C. Decrease the sample_rate parameter in the RandomSampleConfig of the monitoring job
  • D. Increase the monitor_interval parameter in the ScheduleConfig of the monitoring job
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
LaxmanTiwari
1 year, 4 months ago
fitri001 2 months, 1 week ago A. DataFlow pipeline with TFDV: While DataFlow pipelines with TFDV can be used for data validation, they require additional development and management overhead compared to simply adjusting the Vertex AI Model Monitoring job configuration. B. Custom SQL script: Custom SQL scripts might not be as efficient or maintainable as the built-in Vertex AI Model Monitoring features. Additionally, it would require manually calculating drift metrics, which can be error-prone. D. Increase monitor_interval: Increasing the monitoring interval reduces the frequency of monitoring checks, potentially delaying drift detection. This is not ideal if data drifts frequently.
upvoted 2 times
...
fitri001
1 year, 6 months ago
Selected Answer: C
Reduced Monitoring Overhead: By decreasing the sample_rate, you instruct Vertex AI Model Monitoring to analyze a smaller percentage of incoming requests. This directly reduces the billing cost associated with monitoring. Fast Drift Detection: A well-chosen sampling rate can still provide enough data to capture significant data drift. Monitoring a smaller sample shouldn't significantly impact your ability to detect drift if it's happening rapidly.
upvoted 4 times
fitri001
1 year, 6 months ago
A. DataFlow pipeline with TFDV: While DataFlow pipelines with TFDV can be used for data validation, they require additional development and management overhead compared to simply adjusting the Vertex AI Model Monitoring job configuration. B. Custom SQL script: Custom SQL scripts might not be as efficient or maintainable as the built-in Vertex AI Model Monitoring features. Additionally, it would require manually calculating drift metrics, which can be error-prone. D. Increase monitor_interval: Increasing the monitoring interval reduces the frequency of monitoring checks, potentially delaying drift detection. This is not ideal if data drifts frequently.
upvoted 3 times
...
...
Carlose2108
1 year, 8 months ago
Selected Answer: C
I went with C.
upvoted 1 times
...
ddogg
1 year, 9 months ago
Selected Answer: C
C as the sample size will be relative to the traffic and also reduce costs.
upvoted 2 times
...
b1a8fae
1 year, 10 months ago
Selected Answer: C
C. https://cloud.google.com/vertex-ai/docs/model-monitoring/overview#considerations
upvoted 1 times
...
pikachu007
1 year, 10 months ago
Selected Answer: C
The answer is C, simplest and does not affect the time it takes to detect the drift
upvoted 2 times
...

Topic 1 Question 191

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 191 discussion

You work for a retail company. You have created a Vertex AI forecast model that produces monthly item sales predictions. You want to quickly create a report that will help to explain how the model calculates the predictions. You have one month of recent actual sales data that was not included in the training dataset. How should you generate data for your report?

  • A. Create a batch prediction job by using the actual sales data. Compare the predictions to the actuals in the report.
  • B. Create a batch prediction job by using the actual sales data, and configure the job settings to generate feature attributions. Compare the results in the report.
  • C. Generate counterfactual examples by using the actual sales data. Create a batch prediction job using the actual sales data and the counterfactual examples. Compare the results in the report.
  • D. Train another model by using the same training dataset as the original, and exclude some columns. Using the actual sales data create one batch prediction job by using the new model and another one with the original model. Compare the two sets of predictions in the report.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
fitri001
1 year ago
Selected Answer: B
Feature Attribution: By enabling feature attributions in the batch prediction job, you gain insights into how each feature in the actual sales data contributes to the model's predictions. This information is crucial for explaining the model's reasoning to non-technical audiences. Direct Model Insights: Analyzing the feature attributions allows you to demonstrate how the model uses historical trends, seasonality, and other factors (represented by features) to predict future sales.
upvoted 3 times
fitri001
1 year ago
A. Prediction vs. Actuals: While comparing predictions to actuals can be informative, it doesn't directly explain how the model arrives at those predictions. C. Counterfactual Examples: Counterfactuals can be useful for understanding model behavior, but creating them requires additional effort and might not be necessary for explaining the basic prediction process. D. Training a New Model: Training another model is time-consuming and unnecessary. Feature attributions provide valuable insights without needing a separate model.
upvoted 2 times
...
...
MultiCloudIronMan
1 year, 1 month ago
Selected Answer: B
B is the best answer but am unsure why the report has to be compared with actual sales
upvoted 1 times
...
ddogg
1 year, 3 months ago
Selected Answer: B
B) Will actually give you the information needed with Feature Attributions. E.g. The importance of each feature influencing the predictions sales items.
upvoted 3 times
...
pikachu007
1 year, 4 months ago
Selected Answer: B
Feature attributions explicitly measure how much each input feature contributed to each prediction, providing the most relevant insights for understanding model behavior.
upvoted 3 times
...

Topic 1 Question 192

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 192 discussion

Your team has a model deployed to a Vertex AI endpoint. You have created a Vertex AI pipeline that automates the model training process and is triggered by a Cloud Function. You need to prioritize keeping the model up-to-date, but also minimize retraining costs. How should you configure retraining?

  • A. Configure Pub/Sub to call the Cloud Function when a sufficient amount of new data becomes available
  • B. Configure a Cloud Scheduler job that calls the Cloud Function at a predetermined frequency that fits your team’s budget
  • C. Enable model monitoring on the Vertex AI endpoint. Configure Pub/Sub to call the Cloud Function when anomalies are detected
  • D. Enable model monitoring on the Vertex AI endpoint. Configure Pub/Sub to call the Cloud Function when feature drift is detected
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
fitri001
1 year ago
Selected Answer: D
Data-driven Retraining: Monitoring for feature drift identifies significant changes in the underlying data distribution used to train the model. Retraining based on drift detection ensures the model stays relevant to evolving data patterns, prioritizing model accuracy. Reduced Cost: Triggering retraining only when drift is detected avoids unnecessary training runs, minimizing costs associated with Vertex AI training jobs.
upvoted 4 times
fitri001
1 year ago
A. New Data Availability: While new data is important, it might not always necessitate retraining, especially if the new data aligns with existing patterns. B. Predetermined Frequency: Fixed scheduling can lead to either under-training (data evolves faster than the schedule) or over-training (drift happens slower than the schedule), potentially wasting resources. C. Anomaly Detection: Anomalies might not directly indicate feature drift, and retraining based solely on anomalies could introduce noise into the model.
upvoted 2 times
...
...
ddogg
1 year, 3 months ago
Selected Answer: D
D) Makes the most sense and scales
upvoted 2 times
...
b1a8fae
1 year, 3 months ago
Selected Answer: D
Keep the model up to date -> monitoring drift (distribution of production data doesnt change wildly). Only rerun training when necessary.
upvoted 1 times
...
pikachu007
1 year, 4 months ago
Selected Answer: D
It proactively triggers retraining when feature drift is detected, ensuring the model adapts to changing data patterns and maintains accuracy.
upvoted 1 times
...
winston9
1 year, 4 months ago
Selected Answer: D
feature drifting detecting to trigger retraining
upvoted 1 times
...

Topic 1 Question 193

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 193 discussion

Your company stores a large number of audio files of phone calls made to your customer call center in an on-premises database. Each audio file is in wav format and is approximately 5 minutes long. You need to analyze these audio files for customer sentiment. You plan to use the Speech-to-Text API You want to use the most efficient approach. What should you do?

  • A. 1. Upload the audio files to Cloud Storage
    2. Call the speech:longrunningrecognize API endpoint to generate transcriptions
    3. Call the predict method of an AutoML sentiment analysis model to analyze the transcriptions.
  • B. 1. Upload the audio files to Cloud Storage.
    2. Call the speech:longrunningrecognize API endpoint to generate transcriptions
    3. Create a Cloud Function that calls the Natural Language API by using the analyzeSentiment method
  • C. 1. Iterate over your local files in Python
    2. Use the Speech-to-Text Python library to create a speech.RecognitionAudio object, and set the content to the audio file data
    3. Call the speech:recognize API endpoint to generate transcriptions
    4. Call the predict method of an AutoML sentiment analysis model to analyze the transcriptions.
  • D. 1. Iterate over your local files in Python
    2. Use the Speech-to-Text Python Library to create a speech.RecognitionAudio object and set the content to the audio file data
    3. Call the speech:longrunningrecognize API endpoint to generate transcriptions.
    4. Call the Natural Language API by using the analyzeSentiment method
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
guilhermebutzke
Highly Voted 1 year, 9 months ago
Selected Answer: B
My answer: B According to https://cloud.google.com/vertex-ai/docs/text-data/sentiment-analysis/prepare-data, AutoML sentiment analysis requires a minimum of 10 labeled training documents per sentiment category, with a maximum of 100,000 total training documents. This means you need to ensure you have an adequate amount of labeled data to train a reliable model. Therefore, option B is more suitable since the API will return the sentiment and there is no mention of a customized problem that justifies the use of AutoML.
upvoted 8 times
...
BlehMaks
Highly Voted 1 year, 10 months ago
Selected Answer: B
Vertex AI AutoML is overkill as the build-in NL API provides sentiment analysis.
upvoted 5 times
...
OpenKnowledge
Most Recent 1 month, 2 weeks ago
Selected Answer: B
For sentiment analysis, the choice between an AutoML platform and a direct NLP API depends on the specific requirements and resources available. For the problem in this question, the AutoML training might be an overkill when custom training for this problem does not seem necessary for the Sentiment analysis and a NLP API can perform the sentiment analysis. For general sentiment analysis without specific domain requirements: An off-the-shelf NLP API with a pre-trained model is typically the most efficient and straightforward solution. For sentiment analysis in a specific domain with unique language or nuances: AutoML to train a custom model will likely yield more accurate and relevant results, provided the necessary data and resources are available.
upvoted 1 times
...
el_vampiro
2 months ago
Selected Answer: D
A & B do not mention iterating over file in GCS or triggering cloud function for each object being added. That implies one cloud function invocation to process everything, which wont work. Yet another bizarre question.
upvoted 1 times
...
rajshiv
11 months, 1 week ago
Selected Answer: A
It is definitely A and not B. Natural Language API can be used for sentiment analysis but it will require an additional Cloud Function to handle the sentiment analysis, which adds complexity and overhead. Since AutoML models are built specifically for sentiment analysis tasks, using AutoML directly is more efficient.
upvoted 2 times
...
pinimichele01
1 year, 6 months ago
Selected Answer: B
Agree with guilhermebutzke
upvoted 1 times
...
fitri001
1 year, 6 months ago
Selected Answer: B
Scalability: Uploading audio files to Cloud Storage provides a scalable and reliable storage solution for your large dataset. Asynchronous Processing: The speech:longrunningrecognize API enables asynchronous transcription, allowing your code to proceed without waiting for each file to finish processing. This improves overall throughput. Managed Service: Cloud Functions are serverless functions that automatically scale to handle the workload. You don't need to manage servers or infrastructure. Natural Language API Integration: The Cloud Function can directly call the Natural Language API's analyzeSentiment method for sentiment analysis, streamlining the workflow.
upvoted 2 times
fitri001
1 year, 6 months ago
C & D: Local Processing: Iterating over local files and using the Speech-to-Text Python library might be suitable for small datasets. However, for a large number of audio files, local processing becomes slow and inefficient, especially for long audio files (5 minutes). C: Speech-to-Text API Limitation: The speech:recognize API is designed for short audio snippets (less than a minute) and might not be suitable for 5-minute audio files.
upvoted 1 times
...
...
gscharly
1 year, 6 months ago
Selected Answer: B
Agree with guilhermebutzke
upvoted 1 times
...
ddogg
1 year, 9 months ago
Selected Answer: A
A) Efficiency: Option A leverages the optimized and scalable infrastructure of Google Cloud Platform (GCP). Using the speech:longrunningrecognize API allows you to transcribe large audio files efficiently without overwhelming your local machine or network. Cost-effectiveness: Paying for processing in Cloud Storage can be more cost-effective than performing it locally, especially for large datasets. Ease of use: The Cloud Storage and Speech-to-Text APIs are well-documented and provide readily available libraries for easy integration. Scalability: This approach scales easily as your dataset grows, as GCP can handle large workloads efficiently.
upvoted 2 times
...
shadz10
1 year, 9 months ago
Selected Answer: A
Re-considering as question states large dataset going with option A
upvoted 1 times
...
shadz10
1 year, 10 months ago
Selected Answer: B
I’m going with b and agree with BlehMaks - For your convenience, the Natural Language API can perform sentiment analysis directly on a file located in Cloud Storage, without the need to send the contents of the file in the body of your request. Googles best practices try api first then auto ml then custom training. https://cloud.google.com/natural-language/docs/analyzing-sentiment
upvoted 1 times
...
b1a8fae
1 year, 10 months ago
Selected Answer: A
A. It must be longrunningrecognize -> no C. No point speaking about Python files -> no D. Final question being: NL analyzeSentiment or AutoML sentiment? I feel due to large dataset VertexAI AutoML is the way to go (can scale to large volumes of data)
upvoted 1 times
b1a8fae
1 year, 10 months ago
More info: what natural language is right for you? https://cloud.google.com/natural-language?hl=en
upvoted 1 times
...
...
36bdc1e
1 year, 10 months ago
B Because don't need to train model just use google api transcride and sentiment analysis
upvoted 2 times
...
pikachu007
1 year, 10 months ago
Selected Answer: A
Efficient audio processing: speech:longrunningrecognize is specifically designed for handling large audio files, offering asynchronous processing and optimized performance. Scalability: Cloud Storage and Vertex AI AutoML scale seamlessly to handle large volumes of data and model inferences. Cost-effectiveness: Separating transcription and sentiment analysis allows for potential cost optimization by using different pricing models for each service.
upvoted 2 times
...

Topic 1 Question 194

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 194 discussion

You work for a social media company. You want to create a no-code image classification model for an iOS mobile application to identify fashion accessories. You have a labeled dataset in Cloud Storage. You need to configure a training workflow that minimizes cost and serves predictions with the lowest possible latency. What should you do?

  • A. Train the model by using AutoML, and register the model in Vertex AI Model Registry. Configure your mobile application to send batch requests during prediction.
  • B. Train the model by using AutoML Edge, and export it as a Core ML model. Configure your mobile application to use the .mlmodel file directly.
  • C. Train the model by using AutoML Edge, and export the model as a TFLite model. Configure your mobile application to use the .tflite file directly.
  • D. Train the model by using AutoML, and expose the model as a Vertex AI endpoint. Configure your mobile application to invoke the endpoint during prediction.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
forport
1 year, 3 months ago
Selected Answer: B
'for ios mobile' = edge '.mlmodel directly' = minimizes the cost
upvoted 1 times
...
fitri001
1 year, 6 months ago
Selected Answer: B
No-code Training: AutoML Edge simplifies model training without needing extensive coding knowledge. On-device Processing: Core ML models run directly on the iOS device, minimizing latency by eliminating the need for network calls to a cloud endpoint. Cost-effective: Training on AutoML Edge and deploying the model on the device avoids ongoing costs associated with Vertex AI endpoints.
upvoted 4 times
fitri001
1 year, 6 months ago
A. AutoML with Batch Requests: While AutoML offers powerful model training, batch requests for prediction still incur network latency and might not be ideal for real-time mobile applications. C & D. TFLite and Vertex AI Endpoint: Both TFLite and Vertex AI endpoints are viable options, but they require additional steps for mobile integration compared to Core ML, which is native to iOS. Additionally, a Vertex AI endpoint introduces cloud communication and potential costs.
upvoted 2 times
...
...
pinimichele01
1 year, 7 months ago
Selected Answer: B
Core ML is specifically designed for iOS devices, ensuring efficient inference and low latency.
upvoted 2 times
...
guilhermebutzke
1 year, 8 months ago
Selected Answer: B
My Answer: B AutoML Edge or Vertex AI endpoint?: This option is specifically designed for training models that run on edge devices like mobile phones. It optimizes models for size and efficiency, minimizing cost and latency. While AutoML can train the model, using a Vertex AI endpoint adds unnecessary overhead and potential latency for mobile predictions. Batch requests wouldn't significantly improve latency here. Core ML or TFLite: While TFLite is compatible with some mobile platforms, Core ML is specifically designed for iOS and offers better performance and integration.
upvoted 2 times
...
b1a8fae
1 year, 10 months ago
Selected Answer: B
B. Confused as AutoML Vision Edge seems like the right tool for this problematic but is deprecated according to docs: https://firebase.google.com/docs/ml/automl-image-labeling I will assume that the question needs updating but we should go with that + core ML is specifically designed for iOS apps. https://www.netguru.com/blog/coreml-vs-tensorflow-lite-mobile
upvoted 1 times
...
BlehMaks
1 year, 10 months ago
Selected Answer: B
it's possible to use either Core ML or TF Lite, but since it's necessary to ensure the lowest possible latency, choose Core ML https://cloud.google.com/vertex-ai/docs/export/export-edge-model#classification
upvoted 2 times
...
36bdc1e
1 year, 10 months ago
B For no code , automl is the best , for minimizing the cost we export as Core ML model
upvoted 1 times
...
pikachu007
1 year, 10 months ago
Selected Answer: B
No-code model development: AutoML Edge provides a no-code interface for model training, aligning with the requirement. Optimized for mobile devices: Core ML is specifically designed for iOS devices, ensuring efficient inference and low latency. Offline capability: The app can run predictions locally without requiring network calls, reducing costs and ensuring availability even without internet connectivity. No ongoing endpoint costs: Unlike using a Vertex AI endpoint, there are no extra costs associated with hosting and serving the model.
upvoted 2 times
...

Topic 1 Question 195

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 195 discussion

You work for a retail company. You have been asked to develop a model to predict whether a customer will purchase a product on a given day. Your team has processed the company’s sales data, and created a table with the following rows:
• Customer_id
• Product_id
• Date
• Days_since_last_purchase (measured in days)
• Average_purchase_frequency (measured in 1/days)
• Purchase (binary class, if customer purchased product on the Date)

You need to interpret your model’s results for each individual prediction. What should you do?

  • A. Create a BigQuery table. Use BigQuery ML to build a boosted tree classifier. Inspect the partition rules of the trees to understand how each prediction flows through the trees.
  • B. Create a Vertex AI tabular dataset. Train an AutoML model to predict customer purchases. Deploy the model to a Vertex AI endpoint and enable feature attributions. Use the “explain” method to get feature attribution values for each individual prediction.
  • C. Create a BigQuery table. Use BigQuery ML to build a logistic regression classification model. Use the values of the coefficients of the model to interpret the feature importance, with higher values corresponding to more importance
  • D. Create a Vertex AI tabular dataset. Train an AutoML model to predict customer purchases. Deploy the model to a Vertex AI endpoint. At each prediction, enable L1 regularization to detect non-informative features.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
LaxmanTiwari
1 year, 4 months ago
Selected Answer: B
" simplest approach", the option B is the best choice.
upvoted 1 times
...
fitri001
1 year, 6 months ago
Selected Answer: B
Individual Prediction Explanation: Vertex AI feature attributions provide insights into how each feature (e.g., days_since_last_purchase, average_purchase_frequency) contributes to a specific prediction for a customer-product combination. This allows you to understand the rationale behind the model's prediction for each instance. AutoML Convenience: AutoML simplifies model training without extensive configuration.
upvoted 3 times
fitri001
1 year, 6 months ago
A. BigQuery ML with Boosted Trees: While BigQuery ML can build boosted tree models, interpreting individual predictions by inspecting partition rules can be cumbersome and less intuitive compared to feature attributions. C. BigQuery ML Logistic Regression: Logistic regression coefficients indicate feature importance, but they don't directly explain how a specific feature value influences a single prediction. D. L1 Regularization: L1 regularization can help identify potentially unimportant features during training, but it doesn't directly explain individual predictions.
upvoted 2 times
...
...
ddogg
1 year, 9 months ago
Selected Answer: B
Vertex AI feature attributions: This is the most direct approach. By enabling feature attributions, you get explanations for each prediction, highlighting how individual features contribute to the model's output. This is crucial for understanding specific customer purchase predictions.
upvoted 1 times
...
BlehMaks
1 year, 10 months ago
Selected Answer: B
B is correct
upvoted 2 times
...
36bdc1e
1 year, 10 months ago
B loca interpretability we Use the "explain" method to get feature attribution values for each individual prediction.
upvoted 1 times
...
pikachu007
1 year, 10 months ago
Selected Answer: B
Individual prediction interpretability: Feature attributions specifically address the need to understand how features contribute to individual predictions, providing fine-grained insights. Vertex AI integration: Vertex AI offers seamless integration of feature attributions with AutoML models, simplifying the process. Model flexibility: AutoML can explore various model architectures, potentially finding the most suitable one for this task, while still providing interpretability.
upvoted 1 times
...

Topic 1 Question 196

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 196 discussion

You work for a company that captures live video footage of checkout areas in their retail stores. You need to use the live video footage to build a model to detect the number of customers waiting for service in near real time. You want to implement a solution quickly and with minimal effort. How should you build the model?

  • A. Use the Vertex AI Vision Occupancy Analytics model.
  • B. Use the Vertex AI Vision Person/vehicle detector model.
  • C. Train an AutoML object detection model on an annotated dataset by using Vertex AutoML.
  • D. Train a Seq2Seq+ object detection model on an annotated dataset by using Vertex AutoML.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
fitri001
Highly Voted 1 year, 6 months ago
Selected Answer: A
A. Use the Vertex AI Vision Occupancy Analytics model: This is a pre-built model specifically designed for analyzing occupancy in videos. It's ideal for this scenario as it requires minimal configuration and can likely be deployed quickly.
upvoted 5 times
fitri001
1 year, 6 months ago
C. Train an AutoML object detection model: While this could be a good solution in the long run, training a custom model requires creating an annotated dataset and takes time. D. Seq2Seq+ object detection model: This is an overly complex approach for this task. Seq2Seq models are used for sequence-to-sequence prediction tasks and are not necessary here.
upvoted 2 times
...
...
guilhermebutzke
Highly Voted 1 year, 9 months ago
Selected Answer: A
My Answer: A: Vertex AI Vision Occupancy Analytics is a pre-trained model specifically designed to count people in live video streams. This removes the need for expensive and time-consuming data labeling and training, making it ideal for quick implementation. ****Vertex AI Vision Person/Vehicle Detector model detects individual people and vehicles, not specifically focusing on occupancy counting. It would require further processing to estimate the number of waiting customers. Option C and D requires labeling data and training, which adds effort and time. https://cloud.google.com/vision-ai/docs/overview
upvoted 5 times
...
tardigradum
Most Recent 1 year, 3 months ago
Selected Answer: A
It makes sense to use Vertex AI Vision Occupancy to reduce the effort of obtaining a model that identifies the number of people in a video, although I am hesitant about the fact that it says 'BUILD a model' and strictly speaking, no model is actually built with that option.
upvoted 1 times
...
Prakzz
1 year, 4 months ago
Selected Answer: B
https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/vehicle-detector Occupancy analytics has other features too like zone detection, dwell time, and more, which is not needed in this scenario.
upvoted 2 times
...
ddogg
1 year, 9 months ago
Selected Answer: A
A. Use the Vertex AI Vision Occupancy Analytics model. Here's why: Pre-trained and optimized: Occupancy Analytics is a pre-trained and optimized model specifically designed for counting people in video footage, aligning perfectly with your task. This eliminates the need for extensive data collection, annotation, and training, saving time and effort. Near real-time performance: The model is designed for low latency and near real-time inference, providing results quickly with minimal delay, important for live video analysis. Minimal configuration: Compared to training your own model, this option requires minimal configuration within the Vertex AI console, allowing for a quicker setup and deployment.
upvoted 2 times
...
b1a8fae
1 year, 10 months ago
Selected Answer: B
All you need is counting the number of customers in the video stream. I would say no need to have the extra functionalities of occupancy analytics, person/vehicle is enough for this use case. https://cloud.google.com/vision-ai/docs/person-vehicle-model
upvoted 1 times
...
winston9
1 year, 10 months ago
Selected Answer: A
https://codelabs.developers.google.com/vertex-ai-vision-queue-detection#0
upvoted 3 times
...

Topic 1 Question 197

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 197 discussion

You work as an analyst at a large banking firm. You are developing a robust scalable ML pipeline to tram several regression and classification models. Your primary focus for the pipeline is model interpretability. You want to productionize the pipeline as quickly as possible. What should you do?

  • A. Use Tabular Workflow for Wide & Deep through Vertex AI Pipelines to jointly train wide linear models and deep neural networks
  • B. Use Google Kubernetes Engine to build a custom training pipeline for XGBoost-based models
  • C. Use Tabular Workflow for TabNet through Vertex AI Pipelines to train attention-based models
  • D. Use Cloud Composer to build the training pipelines for custom deep learning-based models
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
OpenKnowledge
1 month, 2 weeks ago
Selected Answer: C
The TabNet model is a deep neural network designed for tabular data that combines high performance with inherent model interpretability. It achieves this through a sequential attention mechanism that selectively chooses which features to use at each step of the model, allowing for understandable feature attributions and improved accuracy compared to other models on various real-world datasets.
upvoted 2 times
...
guilhermebutzke
1 year, 3 months ago
Selected Answer: C
My Answer: C Link: https://cloud.google.com/vertex-ai/docs/tabular-data/tabular-workflows/overview
upvoted 3 times
...
ddogg
1 year, 3 months ago
Selected Answer: C
https://www.sciencedirect.com/science/article/pii/S0957417423000441 • When compared to XGBoost & GLM, TabNet provides better or comparable performance. • Unlike other Deep Learning models, TabNet is highly interpretable.
upvoted 3 times
...
sonicclasps
1 year, 3 months ago
Selected Answer: C
agree, C, as this is specifically one of Tabnet's strengths
upvoted 1 times
...
winston9
1 year, 4 months ago
Selected Answer: C
according to the documentation: "TabNet uses sequential attention to choose which features to reason from at each decision step. This promotes interpretability and more efficient learning because the learning capacity is used for the most salient features."
upvoted 1 times
...
pikachu007
1 year, 4 months ago
Selected Answer: C
TabNet models are inherently more interpretable than deep neural networks or XGBoost models due to their attention mechanism. This aligns with the primary focus on interpretability.
upvoted 1 times
...

Topic 1 Question 198

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 198 discussion

You developed a Transformer model in TensorFlow to translate text. Your training data includes millions of documents in a Cloud Storage bucket. You plan to use distributed training to reduce training time. You need to configure the training job while minimizing the effort required to modify code and to manage the cluster’s configuration. What should you do?

  • A. Create a Vertex AI custom training job with GPU accelerators for the second worker pool. Use tf.distribute.MultiWorkerMirroredStrategy for distribution.
  • B. Create a Vertex AI custom distributed training job with Reduction Server. Use N1 high-memory machine type instances for the first and second pools, and use N1 high-CPU machine type instances for the third worker pool.
  • C. Create a training job that uses Cloud TPU VMs. Use tf.distribute.TPUStrategy for distribution.
  • D. Create a Vertex AI custom training job with a single worker pool of A2 GPU machine type instances. Use tf.distribute.MirroredStrategv for distribution.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
fitri001
1 year ago
Selected Answer: A
Vertex AI custom training job: This leverages a managed service within GCP, reducing cluster configuration and management overhead. GPU accelerators for the second worker pool: This allows for distributed training across multiple GPUs, significantly speeding up training compared to a single worker pool. tf.distribute.MultiWorkerMirroredStrategy: This is a TensorFlow strategy specifically designed for distributed training on multiple machines. It minimizes code changes as it handles data parallelization and model replication across devices.
upvoted 4 times
ricardovazz
4 months, 3 weeks ago
Why not D? A requires multi-node cluster configuration, increasing setup complexity and more code adjustments ?
upvoted 1 times
...
fitri001
1 year ago
B. Reduction Server: While Vertex AI supports Reduction Servers, it's generally not required for text translation with Transformers. It's more commonly used for distributed training with specific model architectures. C. Cloud TPU VMs: While Cloud TPUs offer excellent performance, they require significant code modifications to work with Transformer models in TensorFlow. Additionally, managing Cloud TPU VMs involves more complexity compared to Vertex AI custom training jobs. D. Single worker pool: This limits training to a single machine, negating the benefits of distributed training.
upvoted 2 times
...
...
Carlose2108
1 year, 2 months ago
Why not C?
upvoted 2 times
tavva_prudhvi
1 year ago
Yeah, but as the question mentions "minimizing the effort required to modify code and to manage the cluster’s configuration", and TPus may require specific adaptations in the model code to fully exploit TPU capabilities.
upvoted 2 times
...
pinimichele01
1 year, 1 month ago
for me is C
upvoted 1 times
...
...
guilhermebutzke
1 year, 2 months ago
Selected Answer: A
My Answer: A - Distributed training: Utilizes GPUs in 2nd worker pool for speedup. - Minimal code changes: Vertex AI custom job for ease of use. - Managed cluster: No manual configuration needed. Other options: - B: Complex setup with different machine types and Reduction Server. - C: TPUs may not be optimal for Transformers and require code changes. - D: Lacks distributed training, limiting speed improvement.
upvoted 2 times
...
pikachu007
1 year, 4 months ago
Selected Answer: A
Minimizes code modification: MultiWorkerMirroredStrategy often requires minimal code changes to distribute training across multiple workers, aligning with the goal of minimizing effort. Simplifies cluster management: Vertex AI handles cluster configuration and scaling for custom training jobs, reducing the need for manual management. Effective distributed training: MultiWorkerMirroredStrategy is well-suited for large models and datasets, efficiently distributing training across GPUs.
upvoted 3 times
...

Topic 1 Question 199

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 199 discussion

You are developing a process for training and running your custom model in production. You need to be able to show lineage for your model and predictions. What should you do?

  • A. 1. Create a Vertex AI managed dataset.
    2. Use a Vertex AI training pipeline to train your model.
    3. Generate batch predictions in Vertex AI.
  • B. 1. Use a Vertex AI Pipelines custom training job component to tram your model.
    2. Generate predictions by using a Vertex AI Pipelines model batch predict component.
  • C. 1. Upload your dataset to BigQuery.
    2. Use a Vertex AI custom training job to train your model.
    3. Generate predictions by using Vertex Al SDK custom prediction routines.
  • D. 1. Use Vertex AI Experiments to train your model.
    2. Register your model in Vertex AI Model Registry.
    3. Generate batch predictions in Vertex AI.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
guilhermebutzke
Highly Voted 1 year, 9 months ago
Selected Answer: D
My Answer: D According with: https://cloud.google.com/vertex-ai/docs/experiments/intro-vertex-ai-experiments “Vertex AI Experiments is a tool that helps you track and analyze different model architectures, hyperparameters, and training environments, letting you track the steps, inputs, and outputs of an experiment run. Vertex AI Experiments can also evaluate how your model performed in aggregate, against test datasets, and during the training run. You can then use this information to select the best model for your particular use case.”. Considering that both options A and B could demonstrate some form of lineage, I believe option D is the most suitable. The text explicitly states "show lineage for your model and predictions," which aligns perfectly with the functionality provided by Vertex AI Experiments.
upvoted 10 times
...
edoo
Highly Voted 1 year, 8 months ago
Selected Answer: B
Vertex AI Pipelines are suited to do artifact lineage https://cloud.google.com/vertex-ai/docs/pipelines/lineage Experiments can do it also, but their main goal is to "track and analyze different model architectures, hyperparameters, and training environments"
upvoted 7 times
...
bc3f222
Most Recent 8 months ago
Selected Answer: D
Vertex AI Experiments helps track all your training runs, including: Dataset version Hyperparameters Model metrics Code version This enables full lineage and traceability from data → training → model artifact.
upvoted 2 times
...
Ankit267
10 months, 2 weeks ago
Selected Answer: B
Answer is B D is wrong as there is only one model not models, experiments is used for multiple runs of a model/multiple models, also lineage is tracked using a pipeline
upvoted 1 times
...
rajshiv
11 months, 1 week ago
Selected Answer: B
Vertex AI Pipelines will track the Model lineage while the batch prediction component in Vertex AI Pipelines will provide lineage tracking because each prediction is part of the pipeline and is connected to the corresponding training process.
upvoted 1 times
...
AB_C
11 months, 2 weeks ago
Selected Answer: B
Vertex AI Pipeline for lineage tracking
upvoted 1 times
...
Foxy2021
1 year, 1 month ago
My answer is B.
upvoted 1 times
...
baimus
1 year, 2 months ago
It's a bit ambiguously worded this question. Model lineage involves knowledge of the data it was trained on, so that should be A. That being said, I think the question is implying D from it's wording, experiment tracking. I went for A, but suspect it's wrong.
upvoted 1 times
...
SahandJ
1 year, 6 months ago
Selected Answer: D
Option A/B doesn't mention anything about lineage. C is definitely wrong as there is no need to upload the dataset to Bigquery. Only correct answer is D
upvoted 2 times
...
pinimichele01
1 year, 6 months ago
Selected Answer: B
running your custom model in production -> need pipeline -> B
upvoted 1 times
...
cruise93
1 year, 6 months ago
Selected Answer: D
Agree with guilhermebutzke
upvoted 2 times
...
Shark0
1 year, 7 months ago
Selected Answer: A
A because to track lineage you need a managed dataset and vertex ai pipelines
upvoted 1 times
pinimichele01
1 year, 7 months ago
lineage of the model i think, not for data, so it's B
upvoted 1 times
...
...
Yan_X
1 year, 8 months ago
Selected Answer: A
A D cannot provide lineage for the source of your data. Has to be A to go with Vertex AI managed dataset.
upvoted 1 times
...
sonicclasps
1 year, 9 months ago
Selected Answer: A
Managed data set to help track lineage https://cloud.google.com/vertex-ai/docs/training/using-managed-datasets
upvoted 1 times
...
ddogg
1 year, 9 months ago
Selected Answer: B
B) REF https://cloud.google.com/vertex-ai/docs/pipelines/lineage Track the lineage of pipeline artifacts When you run a pipeline using Vertex AI Pipelines, the artifacts and parameters of your pipeline run are stored using Vertex ML Metadata. Vertex ML Metadata makes it easier to analyze the lineage of your pipeline's artifacts, by saving you the difficulty of keeping track of your pipeline's metadata. An artifact's lineage includes all the factors that contributed to its creation, as well as artifacts and metadata that are derived from this artifact. For example, a model's lineage could include the following: The training, test, and evaluation data used to create the model. The hyperparameters used during model training. Metadata recorded from the training and evaluation process, such as the model's accuracy. Artifacts that descend from this model, such as the results of batch predictions.
upvoted 5 times
...
b1a8fae
1 year, 10 months ago
Selected Answer: D
D. Sample on how to keep track of experiments lineage -> https://cloud.google.com/vertex-ai/docs/experiments/user-journey/uj-model-training
upvoted 1 times
...
BlehMaks
1 year, 10 months ago
Selected Answer: B
Vertex AI Pipelines provides ability to track the lineage for your model and predictions
upvoted 1 times
...

Topic 1 Question 200

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 200 discussion

You work for a hotel and have a dataset that contains customers’ written comments scanned from paper-based customer feedback forms, which are stored as PDF files. Every form has the same layout. You need to quickly predict an overall satisfaction score from the customer comments on each form. How should you accomplish this task?

  • A. Use the Vision API to parse the text from each PDF file. Use the Natural Language API analyzeSentiment feature to infer overall satisfaction scores.
  • B. Use the Vision API to parse the text from each PDF file. Use the Natural Language API analyzeEntitySentiment feature to infer overall satisfaction scores.
  • C. Uptrain a Document AI custom extractor to parse the text in the comments section of each PDF file. Use the Natural Language API analyzeSentiment feature to infer overall satisfaction scores.
  • D. Uptrain a Document AI custom extractor to parse the text in the comments section of each PDF file. Use the Natural Language API analyzeEntitySentiment feature to infer overall satisfaction scores.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
fitri001
Highly Voted 1 year, 6 months ago
Selected Answer: C
Document AI custom extractor: Since the layout of the feedback forms is consistent, training a custom extractor in Document AI allows for efficient and accurate extraction of the specific comments section. This ensures the Natural Language API receives the relevant text for sentiment analysis. Natural Language API - analyzeSentiment: This functionality within the Natural Language API is specifically designed to analyze sentiment in a piece of text. It provides an overall sentiment score that can be mapped to a satisfaction score (e.g., high positive sentiment translates to high satisfaction).
upvoted 6 times
fitri001
1 year, 6 months ago
A. Vision API - parseText: While the Vision API can extract text from PDFs, it wouldn't necessarily target the specific comments section without a custom parser. B. Natural Language API - analyzeEntitySentiment: This feature focuses on sentiment analysis for named entities within the text. It might not be ideal for overall satisfaction extraction from general customer comments.
upvoted 4 times
...
...
Kalai_1
Most Recent 10 months, 2 weeks ago
Selected Answer: C
Document AI best fit for this use case.
upvoted 1 times
...
Ankit267
10 months, 3 weeks ago
Selected Answer: A
"quickly" is the differentiator between A & C
upvoted 1 times
...
Pau1234
11 months ago
Selected Answer: C
DocumentAI is perfect for the case. Since the question says: "overall satisfaction", then entity is not needed.
upvoted 1 times
...
lunalongo
11 months, 1 week ago
Selected Answer: A
In summary, option A offers the optimal balance of speed, accuracy, and simplicity for this specific task. Using the pre-trained APIs is faster and requires less expertise than training a custom model. The analyzeSentiment function directly addresses the need for an overall satisfaction score. Why not D? If speed is the absolute priority and the layout is truly consistent, the Vision API's speed might outweigh the potential for slightly improved accuracy from a custom extractor.
upvoted 2 times
...
Foxy2021
1 year, 1 month ago
My vote is a. It is simple and do the job.
upvoted 1 times
...
AzureDP900
1 year, 4 months ago
C is right Document AI custom extractor: Allows you to train a custom model to extract relevant information (in this case, customer comments) from the PDF files. Natural Language API analyzeSentiment feature: Analyzes the sentiment of the extracted text to predict an overall satisfaction score.
upvoted 1 times
...
bobjr
1 year, 5 months ago
Selected Answer: A
C & D are overkill We don't care about entities sentiment -> B is out Left with A and https://cloud.google.com/natural-language/docs/reference/rest/v1/documents/analyzeSentiment
upvoted 2 times
...
pinimichele01
1 year, 7 months ago
Selected Answer: A
quickly predict an overall satisfaction -> a
upvoted 2 times
pinimichele01
1 year, 6 months ago
no sorrt, it's C, you need doc AI
upvoted 1 times
...
...
edoo
1 year, 8 months ago
Selected Answer: A
I go with A, because "you need quickly predict", no time for fine-tunning.
upvoted 3 times
...
guilhermebutzke
1 year, 9 months ago
Selected Answer: C
My answer: Letter C Document AI is a suitable tool for cases where there are patterns of forms or documentation. Additionally, it is possible to directly read PDF files. In the Natural Language API, the analyzeSentiment function can determine the overall sentiment, as the text asks, "You need to quickly predict an overall satisfaction." The analyzeEntitySentiment function provides a score for each entity or word found. https://cloud.google.com/natural-language/docs/basics
upvoted 2 times
...
ddogg
1 year, 9 months ago
Selected Answer: C
Document AI custom extractor: This allows you to tailor the text extraction specifically to the layout and format of your customer feedback forms, ensuring accurate capture of the comments section. Natural Language API analyzeSentiment: This feature analyzes the extracted text and provides an overall sentiment score, which can be used to gauge customer satisfaction.
upvoted 1 times
...
pikachu007
1 year, 10 months ago
Selected Answer: C
Precision in text extraction: Document AI is specifically designed for extracting text from structured documents like forms, ensuring accurate extraction of comments, even with varying handwriting styles. Custom model for form layout: Training a custom extractor tailored to the hotel's feedback form layout further enhances accuracy and targets the relevant comments section effectively. Sentiment analysis: Natural Language API's analyzeSentiment feature analyzes overall sentiment in a text block, aligning with the goal of deriving overall satisfaction scores.
upvoted 2 times
...

Topic 1 Question 201

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 201 discussion

You developed a Vertex AI pipeline that trains a classification model on data stored in a large BigQuery table. The pipeline has four steps, where each step is created by a Python function that uses the KubeFlow v2 API. The components have the following names:



You launch your Vertex AI pipeline as the following:



You perform many model iterations by adjusting the code and parameters of the training step. You observe high costs associated with the development, particularly the data export and preprocessing steps. You need to reduce model development costs. What should you do?

  • A. Change the components’ YAML filenames to export.yaml, preprocess,yaml, f "train-
    {dt}.yaml", f"calibrate-{dt).vaml".
  • B. Add the {"kubeflow.v1.caching": True} parameter to the set of params provided to your PipelineJob.
  • C. Move the first step of your pipeline to a separate step, and provide a cached path to Cloud Storage as an input to the main pipeline.
  • D. Change the name of the pipeline to f"my-awesome-pipeline-{dt}".
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
guilhermebutzke
Highly Voted 1 year, 9 months ago
Selected Answer: A
My Answer: A From what I understood, it's about optimizing the process of adjusting code while utilizing previously processed results from the pipeline. Kubeflow inherently caches these steps, eliminating the need to explicitly store results in a designated path. However, the original filenames include a timestamp (**`-dt`**), suggesting that by removing this timestamp, the pipeline steps might not rerun as expected. Option C could be an approach, but it would require more effort to implement (since Kubeflow handles it automatically). Additionally, the beginning of the option only mentions moving the first step, which is the export, and doesn't say anything about preprocessing (which could be one of the more expensive steps). So, considering all of these factors, I think A is the best choice."
upvoted 8 times
...
mfounta
Most Recent 3 months, 1 week ago
Selected Answer: A
A "When caching is enabled for a component, KFP will reuse the component’s outputs if the component is executed again with the same inputs and parameters (and the output is still available)." With the dynamic file names, inputs will vary per execution. Caching will only work if the inputs are constant. https://www.kubeflow.org/docs/components/pipelines/user-guides/core-functions/caching/#:~:text=Caching%20is%20enabled%20by%20default,False)%20on%20a%20task%20object.&text=You%20can%20also%20enable%20or,submitting%20a%20pipeline%20for%20execution.
upvoted 4 times
...
HaroonRaizada01
8 months ago
Selected Answer: B
**Use Option B** (`{"kubeflow.v1.caching": True}`) to enable caching in your Vertex AI pipeline. This is the most efficient and cost-effective way to avoid redundant executions of expensive steps like data export and preprocessing.
upvoted 1 times
...
Sivaram06
10 months ago
Selected Answer: B
Adding caching to your pipeline by setting the parameter {"kubeflow.v1.caching": True} is the most efficient and effective approach to reduce model development costs, particularly for steps like data export and preprocessing, which are often time-consuming and costly to repeat during multiple iterations. This will help you avoid unnecessary re-computation and save on resource usage.
upvoted 1 times
...
lunalongo
11 months, 1 week ago
Selected Answer: C
Option A is a superficial change with no significant impact on cost optimization. Option C is the correct approach for effectively leveraging caching to reduce costs. C strategically uses the caching mechanism by separating the expensive preprocessing steps and storing their outputs in Cloud Storage, thus reducing costs by reusing the preprocessed data across multiple pipeline runs. Changing filenames could affect caching only if the caching mechanism relies on exact filename matching, which is unlikely. Besides, Kubeflow and Vertex AI Pipelines do not automatically handle caching of intermediate results; it is not inherent to the pipeline steps themselves; it's a feature that needs to be explicitly managed and leveraged.
upvoted 2 times
...
f084277
12 months ago
Selected Answer: A
A. The dynamic filename is causing kubeflow to be unable to cache the export and preprocess steps, causing the problems mentioned in the question.
upvoted 3 times
...
Foxy2021
1 year, 1 month ago
I select C: By leveraging a Dataproc cluster, you can maintain compatibility with your existing PySpark jobs, minimize management overhead, and create a scalable proof of concept quickly and efficiently.
upvoted 1 times
...
Foxy2021
1 year, 1 month ago
I select B. A: Changing the YAML filenames does not affect caching behavior or cost reduction. The pipeline's efficiency and cost effectiveness are primarily governed by how it handles inputs and outputs rather than the filenames of the components. C: Moving the first step to a separate pipeline may help with organization but doesn’t directly address the cost incurred by repeated data exports and preprocessing. Also, simply providing a cached path does not guarantee that the preprocessing step itself won’t be executed multiple times. D: Changing the name of the pipeline to include a timestamp or other identifier does not influence caching or resource usage. It merely alters the identification of the pipeline runs without any impact on the efficiency of the operations being performed.
upvoted 1 times
...
gscharly
1 year, 6 months ago
Selected Answer: A
see guilhermebutzke
upvoted 1 times
...
pinimichele01
1 year, 6 months ago
Selected Answer: A
see guilhermebutzke
upvoted 1 times
...
Yan_X
1 year, 8 months ago
Selected Answer: C
C Caching should be enabled for all steps, e.g., export, preprocessing and training.
upvoted 1 times
...
shadz10
1 year, 9 months ago
Selected Answer: C
Not A - Changing file names does not help with reducing costs Not B - you cannot directly use kubeflow.v1.caching on a pipeline that uses the KubeFlow v2 API. Version Incompatibility: The kubeflow.v1.caching module is specifically designed for KubeFlow Pipelines v1, and its structure and functionality are not directly compatible with KubeFlow Pipelines v2. so best option here is C
upvoted 2 times
...
b1a8fae
1 year, 10 months ago
Selected Answer: C
I considered B but a search of "kubeflow.v1.caching" on Google only produces 1 result, which is this very question on this very website. Thus, I rule it out as non-existent (please share a resource if there is any that proves it exists) and opt for C.
upvoted 1 times
...
BlehMaks
1 year, 10 months ago
Selected Answer: A
i think it's A. 1)if we want to use the same results several times we shouldn't rename them. so we need to delete {dt} from the first two components names. 2)we already have this option enable_caching = True, why do we need kubeflow.v1.caching then? 3)i'm not sure but may be it does metter
upvoted 2 times
BlehMaks
1 year, 10 months ago
3)i'm not sure but may be it does matter that KubeFlow v2 API and kubeflow.v1.caching have different versions (v1 and v2)
upvoted 1 times
...
...
pikachu007
1 year, 10 months ago
Selected Answer: B
Enables caching: Setting this parameter instructs Vertex AI Pipelines to cache the outputs of pipeline steps that have successfully completed. This means that if a step's inputs haven't changed, its execution can be skipped, reusing the cached output instead. Targets costly steps: The prompt highlights that data export and preprocessing steps are particularly expensive. Caching these steps can significantly reduce costs during model iterations.
upvoted 2 times
...

Topic 1 Question 202

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 202 discussion

You work for a startup that has multiple data science workloads. Your compute infrastructure is currently on-premises, and the data science workloads are native to PySpark. Your team plans to migrate their data science workloads to Google Cloud. You need to build a proof of concept to migrate one data science job to Google Cloud. You want to propose a migration process that requires minimal cost and effort. What should you do first?

  • A. Create a n2-standard-4 VM instance and install Java, Scala, and Apache Spark dependencies on it.
  • B. Create a Google Kubernetes Engine cluster with a basic node pool configuration, install Java, Scala, and Apache Spark dependencies on it.
  • C. Create a Standard (1 master, 3 workers) Dataproc cluster, and run a Vertex AI Workbench notebook instance on it.
  • D. Create a Vertex AI Workbench notebook with instance type n2-standard-4.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
guilhermebutzke
Highly Voted 1 year, 9 months ago
Selected Answer: C
My answer: C C: This option leverages Google Cloud's Dataproc service, which is designed for running Apache Spark and other big data processing frameworks. By creating a Standard Dataproc cluster, you can easily scale resources as needed for your workload. A. n2-standard-4 VM: This requires manual setup and ongoing maintenance, increasing cost and effort. B. GKE cluster: While offering containerization benefits, it necessitates managing containers and Spark configurations, adding complexity. D. With Vertex AI Workbench, your team can develop, train, and deploy machine learning models using popular frameworks like TensorFlow, PyTorch, and scikit-learn. However, while Vertex AI Workbench supports PySpark, it may not be the optimal choice for migrating existing PySpark workloads, as it's primarily focused on machine learning tasks.
upvoted 5 times
Carlose2108
1 year, 8 months ago
You're right but I have a doubt about in a part of Option D "You need to build a proof of concept to migrate one data science job to Google Cloud"
upvoted 2 times
...
...
bigdapper
Most Recent 2 months, 1 week ago
Selected Answer: D
D: minimal cost and effort. Can run PySpark on the notebook.
upvoted 1 times
...
lunalongo
11 months, 1 week ago
Selected Answer: C
C is the right answer because it ensures: Cost-effectiveness: Dataproc is managed and you only pay for the compute time used, which is cost-effective for a POC. A standard cluster is enough for the task. Ease of use: Dataproc simplifies the process of setting up and managing a Spark cluster Minimal effort: a Dataproc cluster + a Vertex AI Workbench instance is a straightforward process through the console or command-line tools, minimizing setup time and effort compared to manually configuring VMs or Kubernetes clusters. *A and B include manual installation steps; D creates a notebook environment but it's not enough to run a PySpark job.
upvoted 2 times
...
DaleR
11 months, 3 weeks ago
D. Just ran a pilot on Workbench
upvoted 1 times
...
f084277
12 months ago
Selected Answer: D
D. "minimal cost and effort". There's only one answer.
upvoted 1 times
...
baimus
1 year, 2 months ago
Selected Answer: C
C and D are both valid, as people point out you can technically have Spark preinstalled on D. But this is for a proof of concept for the real design. The concept is not proved by using a notebook, as it's not best practice. Therefore C makes more sense, and is still low effort as it's managed.
upvoted 4 times
...
AK2020
1 year, 3 months ago
Selected Answer: C
C is the answer
upvoted 1 times
...
TanTran04
1 year, 4 months ago
Selected Answer: C
I'm following option C. Please take a look the concept of 'Dataproc documentation' (ref: https://cloud.google.com/dataproc/docs) With option D: doesn't provide a solution for managing and scaling the Spark environment, which is necessary for running PySpark workloads.
upvoted 3 times
...
fitri001
1 year, 6 months ago
Selected Answer: D
Vertex AI Workbench notebook: This option provides a pre-configured environment with popular data science libraries like PySpark already installed. It allows you to focus on migrating your PySpark code with minimal changes. n2-standard-4 instance type: This is a general-purpose machine type suitable for various data science tasks. It offers a good balance between cost and performance for initial exploration.
upvoted 1 times
fitri001
1 year, 6 months ago
A. Create a n2-standard-4 VM instance: This option requires manually installing Java, Scala, and Spark dependencies, which is time-consuming and prone to errors. It also involves managing the VM instance lifecycle, increasing complexity. B. Create a Google Kubernetes Engine cluster: Setting up and managing a Kubernetes cluster for a single job is overkill for a proof of concept. It adds unnecessary complexity and cost. C. Create a Standard Dataproc cluster: While Dataproc is a managed Spark environment on GCP, setting up a full cluster (master and workers) might be more resource-intensive than needed for a single job, especially for a proof of concept.
upvoted 1 times
...
pinimichele01
1 year, 6 months ago
https://cloud.google.com/architecture/hadoop/migrating-apache-spark-jobs-to-cloud-dataproc#overview why not c?
upvoted 1 times
...
Jason_Cloud_at
1 year, 2 months ago
Option D doesnt provide Pyspark out of the box, you have to manually install it wherelse in C dataproc is managed spark and hadoop services which supports running pyspark services right away.
upvoted 1 times
...
...
gscharly
1 year, 6 months ago
Selected Answer: D
went with D: https://cloud.google.com/vertex-ai/docs/workbench/instances/create-dataproc-enabled
upvoted 2 times
pinimichele01
1 year, 6 months ago
https://cloud.google.com/architecture/hadoop/migrating-apache-spark-jobs-to-cloud-dataproc#overview
upvoted 1 times
...
...
pinimichele01
1 year, 7 months ago
Selected Answer: C
When you want to move your Apache Spark workloads from an on-premises environment to Google Cloud, we recommend using Dataproc to run Apache Spark/Apache Hadoop clusters. https://cloud.google.com/architecture/hadoop/migrating-apache-spark-jobs-to-cloud-dataproc#overview
upvoted 1 times
...
Yan_X
1 year, 8 months ago
Selected Answer: D
D Can use Notebook pre-installed libraries and tools, including PySpark.
upvoted 2 times
...
Carlose2108
1 year, 8 months ago
Selected Answer: D
My bad, I mean is Option D.
upvoted 1 times
...
Carlose2108
1 year, 8 months ago
Selected Answer: C
I went with C. For Proof Of Concept and requires minimal cost and effort. Furthermore, Vertex AI Workbench notebooks come pre-configured with PySpark.
upvoted 2 times
...
ddogg
1 year, 9 months ago
Selected Answer: C
Agree with BlehMaks https://cloud.google.com/architecture/hadoop/migrating-apache-spark-jobs-to-cloud-dataproc#overview Dataproc cluster seems more suitable
upvoted 2 times
...
shadz10
1 year, 9 months ago
Selected Answer: D
https://cloud.google.com/vertex-ai-notebooks?hl=en Data Data Lake and Spark in one place Whether you use TensorFlow, PyTorch, or Spark, you can run any engine from Vertex AI Workbench.  D is correct
upvoted 1 times
...
BlehMaks
1 year, 10 months ago
Selected Answer: C
https://cloud.google.com/architecture/hadoop/migrating-apache-spark-jobs-to-cloud-dataproc#overview
upvoted 2 times
...

Topic 1 Question 203

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 203 discussion

You work for a bank. You have been asked to develop an ML model that will support loan application decisions. You need to determine which Vertex AI services to include in the workflow. You want to track the model’s training parameters and the metrics per training epoch. You plan to compare the performance of each version of the model to determine the best model based on your chosen metrics. Which Vertex AI services should you use?

  • A. Vertex ML Metadata, Vertex AI Feature Store, and Vertex AI Vizier
  • B. Vertex AI Pipelines, Vertex AI Experiments, and Vertex AI Vizier
  • C. Vertex ML Metadata, Vertex AI Experiments, and Vertex AI TensorBoard
  • D. Vertex AI Pipelines, Vertex AI Feature Store, and Vertex AI TensorBoard
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
dija123
1 year, 4 months ago
Selected Answer: C
Agree with C
upvoted 2 times
...
pinimichele01
1 year, 7 months ago
Selected Answer: C
agree with pikachu007
upvoted 1 times
...
VipinSingla
1 year, 8 months ago
Why not B ?
upvoted 2 times
info_appsatori
1 year, 3 months ago
I guess because Vizier is a tool that helps to tune hyperparameters, and in a contrary Tensorboard is a tool to explore experiments.
upvoted 3 times
...
...
Carlose2108
1 year, 8 months ago
Selected Answer: C
I went C
upvoted 1 times
...
winston9
1 year, 10 months ago
Selected Answer: C
use Tensorboard to track the model’s training parameters and the metrics per training epoch.
upvoted 3 times
...
pikachu007
1 year, 10 months ago
Selected Answer: C
Vertex ML Metadata: Tracks model training parameters, hyperparameters, metrics, and lineage information. Stores metadata in a central repository for easy access and comparison. Integrates seamlessly with Vertex AI Experiments and TensorBoard. Vertex AI Experiments: Organizes and manages model training runs as experiments. Visualizes experiment results, including metrics and parameter comparisons. Facilitates tracking of the best performing model versions. Vertex AI TensorBoard: Provides detailed visualizations of training metrics and model performance. Enables analysis of model behavior at each training epoch. Integrates with Vertex AI Experiments for seamless access to experiment data.
upvoted 4 times
...

Topic 1 Question 204

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 204 discussion

You work for an auto insurance company. You are preparing a proof-of-concept ML application that uses images of damaged vehicles to infer damaged parts. Your team has assembled a set of annotated images from damage claim documents in the company’s database. The annotations associated with each image consist of a bounding box for each identified damaged part and the part name. You have been given a sufficient budget to train models on Google Cloud. You need to quickly create an initial model. What should you do?

  • A. Download a pre-trained object detection model from TensorFlow Hub. Fine-tune the model in Vertex AI Workbench by using the annotated image data.
  • B. Train an object detection model in AutoML by using the annotated image data.
  • C. Create a pipeline in Vertex AI Pipelines and configure the AutoMLTrainingJobRunOp component to train a custom object detection model by using the annotated image data.
  • D. Train an object detection model in Vertex AI custom training by using the annotated image data.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
pikachu007
Highly Voted 1 year, 10 months ago
Selected Answer: B
Speed: AutoML excels in creating high-quality models with minimal code and setup, significantly accelerating model development. Ease of use: It provides a user-friendly interface and automates many aspects of model training, making it accessible even for those without extensive ML expertise. Automatic optimization: AutoML automatically handles hyperparameter tuning, feature engineering, and architecture selection, reducing manual effort and expertise required. Custom object detection: It supports custom object detection tasks, directly addressing the need to identify damaged parts in images.
upvoted 5 times
...
louisaok
Most Recent 11 months, 4 weeks ago
Selected Answer: B
>>" You have been given a sufficient budget to train models on Google Cloud" it is rare to see a company give enough money to run a mission-critical project.
upvoted 1 times
...
Foxy2021
1 year, 1 month ago
My vote is B
upvoted 1 times
...
VinaoSilva
1 year, 4 months ago
Selected Answer: B
quickly create an initial model = automl
upvoted 3 times
...
pinimichele01
1 year, 7 months ago
Selected Answer: B
went with B
upvoted 2 times
...
edoo
1 year, 8 months ago
Selected Answer: B
By doing B we are doing D. I suppose B in more specific about the model and thus "more" correct? Thoughts?
upvoted 2 times
...
ddogg
1 year, 9 months ago
Selected Answer: B
B makes the most sense, data is already labelled and a pretrained model may not fit for this specific case
upvoted 1 times
...

Topic 1 Question 205

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 205 discussion

You are analyzing customer data for a healthcare organization that is stored in Cloud Storage. The data contains personally identifiable information (PII). You need to perform data exploration and preprocessing while ensuring the security and privacy of sensitive fields. What should you do?

  • A. Use the Cloud Data Loss Prevention (DLP) API to de-identify the PII before performing data exploration and preprocessing.
  • B. Use customer-managed encryption keys (CMEK) to encrypt the PII data at rest, and decrypt the PII data during data exploration and preprocessing.
  • C. Use a VM inside a VPC Service Controls security perimeter to perform data exploration and preprocessing.
  • D. Use Google-managed encryption keys to encrypt the PII data at rest, and decrypt the PII data during data exploration and preprocessing.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
fitri001
1 year ago
Selected Answer: A
Cloud DLP API: This service redacts or replaces sensitive information in your data before processing. It allows data exploration and analysis without exposing PII directly. Privacy Preservation: De-identification ensures sensitive information is not revealed during analysis, protecting patient privacy.
upvoted 4 times
fitri001
1 year ago
B. CMEK and decryption: While CMEKs provide strong encryption, decrypting PII data during exploration exposes sensitive information. This increases the risk of accidental leaks or unauthorized access. C. VM with VPC Service Controls: This approach can add complexity and doesn't directly address PII privacy concerns during analysis. D. Google-managed encryption and decryption: Similar to option B, decrypting PII data for exploration weakens privacy.
upvoted 2 times
...
...
pinimichele01
1 year, 1 month ago
Selected Answer: A
https://cloud.google.com/dlp/docs/inspect-sensitive-text-de-identify
upvoted 1 times
...
edoo
1 year, 2 months ago
Selected Answer: A
A is obvious.
upvoted 1 times
...
b1a8fae
1 year, 3 months ago
Selected Answer: A
A. https://cloud.google.com/dlp/docs/inspect-sensitive-text-de-identify
upvoted 1 times
...
pikachu007
1 year, 4 months ago
Selected Answer: A
Minimizes exposure of sensitive data: De-identification replaces or removes sensitive information, reducing the risk of accidental exposure or unauthorized access during analysis. Preserves data utility: DLP can de-identify data while maintaining its usefulness for exploration and preprocessing, ensuring meaningful analysis without compromising privacy. Flexibility in de-identification: You can choose appropriate de-identification techniques (e.g., masking, pseudonymization, generalization) based on specific privacy requirements and analysis needs.
upvoted 2 times
...

Topic 1 Question 206

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 206 discussion

You are building a predictive maintenance model to preemptively detect part defects in bridges. You plan to use high definition images of the bridges as model inputs. You need to explain the output of the model to the relevant stakeholders so they can take appropriate action. How should you build the model?

  • A. Use scikit-learn to build a tree-based model, and use SHAP values to explain the model output.
  • B. Use scikit-learn to build a tree-based model, and use partial dependence plots (PDP) to explain the model output.
  • C. Use TensorFlow to create a deep learning-based model, and use Integrated Gradients to explain the model output.
  • D. Use TensorFlow to create a deep learning-based model, and use the sampled Shapley method to explain the model output.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
FireAtMe
11 months, 1 week ago
Selected Answer: C
The question is about image/pixels. So the integrated Gradients is better. Shapley is for input features.
upvoted 2 times
...
dija123
1 year, 4 months ago
Selected Answer: C
Use Integrated Gradients to explain the model output
upvoted 3 times
...
pinimichele01
1 year, 7 months ago
Selected Answer: C
https://cloud.google.com/vertex-ai/docs/explainable-ai/overview
upvoted 2 times
...
Shark0
1 year, 7 months ago
Selected Answer: C
Given the scenario of using high definition images as inputs for predictive maintenance on bridges, and the need to explain the model output to stakeholders, the most appropriate choice would be: C. Use TensorFlow to create a deep learning-based model, and use Integrated Gradients to explain the model output. Integrated Gradients is a method used to explain the predictions of deep learning models by attributing the contribution of each pixel in the input image to the final prediction. This would provide insights into which parts of the bridge images are most influential in the model's decision-making process, helping stakeholders understand why a particular prediction was made and allowing them to take appropriate action.
upvoted 4 times
...
BlehMaks
1 year, 10 months ago
Selected Answer: C
https://cloud.google.com/ai-platform/prediction/docs/ai-explanations/overview#compare-methods
upvoted 2 times
pinimichele01
1 year, 7 months ago
https://cloud.google.com/vertex-ai/docs/explainable-ai/overview this is right, your is deprecated!
upvoted 1 times
...
...
pikachu007
1 year, 10 months ago
Selected Answer: C
Handling image input: Deep learning models excel in processing complex visual data like high-definition images, making them ideal for extracting relevant features from bridge images for defect detection. Explainability with Integrated Gradients: Integrated Gradients is a powerful technique specifically designed to explain the predictions of deep learning models. It attributes model output to specific input features, providing insights into how the model makes decisions. Visualization: Integrated Gradients can generate visual explanations, such as heatmaps, that highlight image regions most influential to predictions, aiding in understanding and trust for stakeholders.
upvoted 1 times
...

Topic 1 Question 207

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 207 discussion

You work for a hospital that wants to optimize how it schedules operations. You need to create a model that uses the relationship between the number of surgeries scheduled and beds used. You want to predict how many beds will be needed for patients each day in advance based on the scheduled surgeries. You have one year of data for the hospital organized in 365 rows.

The data includes the following variables for each day:
• Number of scheduled surgeries
• Number of beds occupied
• Date

You want to maximize the speed of model development and testing. What should you do?

  • A. Create a BigQuery table. Use BigQuery ML to build a regression model, with number of beds as the target variable, and number of scheduled surgeries and date features (such as day of week) as the predictors.
  • B. Create a BigQuery table. Use BigQuery ML to build an ARIMA model, with number of beds as the target variable, and date as the time variable.
  • C. Create a Vertex AI tabular dataset. Train an AutoML regression model, with number of beds as the target variable, and number of scheduled minor surgeries and date features (such as day of the week) as the predictors.
  • D. Create a Vertex AI tabular dataset. Train a Vertex AI AutoML Forecasting model, with number of beds as the target variable, number of scheduled surgeries as a covariate and date as the time variable.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
el_vampiro
2 months ago
Selected Answer: B
BQML ARIMA+ is the quickest to develop and test as it trains faster than AutoML. It can do Multivariate forecasting.
upvoted 1 times
...
Fer660
2 months ago
Selected Answer: D
Not A or C: regression wil not do the trick because this is a time series. Not B: ARIMA is univariate, so if you are using the date as predictor, you won't be able to use the number of scheduled surgeries as predictor, and lose the key piece of information.
upvoted 1 times
...
NamitSehgal
8 months, 4 weeks ago
Selected Answer: D
A (BigQuery ML Regression): A simple regression model is not designed for time series forecasting. It wouldn't capture the temporal dependencies in the data and wouldn't be able to effectively predict future bed usage based on past trends. D Forecasting Task
upvoted 1 times
...
Hrishikesh1992
9 months, 3 weeks ago
Selected Answer: B
We have only 360 rows of data, I went with it because we require statistical model here rather than ML models.
upvoted 1 times
...
lunalongo
11 months, 1 week ago
Selected Answer: A
A is the best option because: - BigQuery ML allows model build/training within BigQuery using SQL - This is a regression model, not a timeseries forecast; no ARIMA (B) fit! - Data transfer to Vertex AI (C, D) and usage of AutoML not needed - AutoML is better for larger datasets, BigQuery ML works for 365 rows
upvoted 4 times
...
forport
1 year, 3 months ago
Selected Answer: D
'Vertex AI AutoML Forecasting' == for forecasting time series data
upvoted 4 times
...
VinaoSilva
1 year, 4 months ago
Selected Answer: D
"You want to predict how many beds will be needed for patients each day" = Forecasting
upvoted 3 times
...
dija123
1 year, 4 months ago
Selected Answer: D
Train a Vertex AI AutoML Forecasting model
upvoted 1 times
...
info_appsatori
1 year, 4 months ago
Selected Answer: A
IDK, i going with A, because its maximize the speed of development and testing. Also in question it says: You need to create a model that uses the """relationship"" between the number of surgeries scheduled and beds used. = linear regression problem.
upvoted 1 times
...
b2aaace
1 year, 6 months ago
Selected Answer: C
I don't think this is a time series forecasting problem. The question clearly states that we should predict the number of beds based on the number of scheduled surgeries. this is a simple linear regression problem.
upvoted 1 times
pinimichele01
1 year, 6 months ago
"You want to predict how many beds will be needed for patients each day in advance based on the scheduled surgeries."
upvoted 1 times
...
...
fitri001
1 year, 6 months ago
Selected Answer: D
Vertex AI AutoML Forecasting: This option leverages Vertex AI's AutoML capabilities for time series forecasting. It automatically explores different model types and hyperparameters to find the best fit for your data. This can significantly speed up model development compared to building a model from scratch. Date as time variable, surgeries as covariate: This approach acknowledges the time-series nature of bed occupancy with "date" as the time series variable. It also incorporates the "number of scheduled surgeries" as a covariate, allowing the model to learn the relationship between surgeries and bed usage.
upvoted 2 times
fitri001
1 year, 6 months ago
A. BigQuery ML regression: While BigQuery ML offers quick model building, a regression model might not capture the time-series aspect of daily bed occupancy. Daily bed occupancy might have trends or seasonality which a plain regression model wouldn't capture. B. BigQuery ML ARIMA: ARIMA models are specifically for stationary time series data, and hospital bed occupancy might not always be stationary (e.g., holiday season might lead to higher occupancy). Additionally, ARIMA models typically don't incorporate additional features like the number of scheduled surgeries. C. Vertex AI AutoML Regression: Similar to option A, a regression model might not capture the time series aspect. While Vertex AI offers AutoML regression, using a solution designed for time series forecasting is more suitable here.
upvoted 3 times
el_vampiro
2 months ago
ARIMA can do multivariate and also take holidays into account. Also trains much faster than AutoML
upvoted 1 times
...
...
...
pinimichele01
1 year, 7 months ago
Selected Answer: D
best suited
upvoted 1 times
pinimichele01
1 year, 6 months ago
not b: ARIMA does not use number of scheduled surgeries, and it is stated that the prediction must be based on that variable
upvoted 1 times
el_vampiro
2 months ago
ARIMA can do multivariate timeseries
upvoted 1 times
...
...
...
CHARLIE2108
1 year, 8 months ago
Selected Answer: B
I went with B.
upvoted 1 times
...
sonicclasps
1 year, 9 months ago
Selected Answer: D
best suited, and treats the input as a time series, unlike A
upvoted 1 times
...
Yan_X
1 year, 9 months ago
Selected Answer: D
D, as B doesn't mention the 'number of scheduled surgeries'.
upvoted 2 times
...
shadz10
1 year, 10 months ago
Selected Answer: D
D is correct I believe
upvoted 2 times
shadz10
1 year, 9 months ago
https://cloud.google.com/vertex-ai/docs/tabular-data/forecasting/overview
upvoted 1 times
...
...
b1a8fae
1 year, 10 months ago
Selected Answer: A
A. Using BigQuery to comply requirement of speed of development. ARIMA does not use number of scheduled surgeries, and it is stated that the prediction must be based on that variable. So it must be A. LR model on BQ using scheduled surgeries, day of the week, etc, as predictors.
upvoted 4 times
...

Topic 1 Question 208

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 208 discussion

You recently developed a wide and deep model in TensorFlow. You generated training datasets using a SQL script that preprocessed raw data in BigQuery by performing instance-level transformations of the data. You need to create a training pipeline to retrain the model on a weekly basis. The trained model will be used to generate daily recommendations. You want to minimize model development and training time. How should you develop the training pipeline?

  • A. Use the Kubeflow Pipelines SDK to implement the pipeline. Use the BigQueryJobOp component to run the preprocessing script and the CustomTrainingJobOp component to launch a Vertex AI training job.
  • B. Use the Kubeflow Pipelines SDK to implement the pipeline. Use the DataflowPythonJobOp component to preprocess the data and the CustomTrainingJobOp component to launch a Vertex AI training job.
  • C. Use the TensorFlow Extended SDK to implement the pipeline Use the ExampleGen component with the BigQuery executor to ingest the data the Transform component to preprocess the data, and the Trainer component to launch a Vertex AI training job.
  • D. Use the TensorFlow Extended SDK to implement the pipeline Implement the preprocessing steps as part of the input_fn of the model. Use the ExampleGen component with the BigQuery executor to ingest the data and the Trainer component to launch a Vertex AI training job.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
lunalongo
Highly Voted 11 months, 1 week ago
Selected Answer: C
C is the best option because: - TFX is designed for ML pipelines, reducing custom code needs and development time and training time as the statement requires - ExampleGen with the BQ executor eliminate data export needs; - Trainer component seamlessly integrates with Vertex AI, leveraging its managed infrastructure for training, further reducing development and operational overhead. A & B uses Kubeflow Pipelines, which would mean more development time and code customization (your model is in TensorFlow); D puts preprocessing inside input_fn, which is generally less efficient for large datasets and complex transformations.
upvoted 5 times
...
OpenKnowledge
Most Recent 3 weeks, 6 days ago
Selected Answer: C
TFX Components: ExampleGen: ingesting and optionally splitting the input dataset into training and evaluation sets. StatisticsGen: Generates descriptive statistics for the dataset, which are crucial for understanding data characteristics and identifying potential issues. Transform: Performs feature engineering and data preprocessing using TensorFlow Transform. It applies transformations to the raw data to prepare it for model training. Trainer: Trains the machine learning model using the preprocessed data and specified model architecture. Tuner: Optimizes the hyperparameters of the model to improve performance.
upvoted 1 times
...
bigdapper
2 months, 1 week ago
Selected Answer: A
Ans: A The goal is to minimize model development and training time. The processing script is already developed in BQ. Option C requires rewriting and reimplementing the transformation logic.
upvoted 4 times
...
batevv
7 months, 2 weeks ago
Selected Answer: A
The correct answer is: A. Use the Kubeflow Pipelines SDK to implement the pipeline. Use the BigQueryJobOp component to run the preprocessing script and the CustomTrainingJobOp component to launch a Vertex AI training job. Kubeflow Pipelines (KFP) is a good choice for orchestrating training pipelines, especially since the requirement is to minimize model development and training time. The BigQueryJobOp component is appropriate because the data preprocessing is already performed in BigQuery using SQL scripts. Using BigQueryJobOp avoids unnecessary additional processing layers. The CustomTrainingJobOp component allows launching a Vertex AI training job, which aligns with the need for scalable and managed model training.
upvoted 1 times
...
tdum76000
1 year, 2 months ago
Selected Answer: D
"If you use TensorFlow in an ML workflow that processes terabytes of structured data or text data, we recommend that you build your pipeline using TFX." https://cloud.google.com/vertex-ai/docs/pipelines/build-pipeline Google recommends TFX for large amount of structured data. Use input_fn for the Tensorflow model as it will output a tf.data.Dataset object. Note: As it is not mentionned that we are working with terabytes of data, Kubeflow is a viable option and i would choose answer A but i'll stick to google's recommendations
upvoted 2 times
...
forport
1 year, 3 months ago
Selected Answer: C
Option C is the most suitable because TFX provides a comprehensive MLOps framework, seamlessly integrating data ingestion, preprocessing, and model training, while also offering strong support for Vertex AI, making it the most efficient solution for the given use case.
upvoted 2 times
...
AK2020
1 year, 3 months ago
Selected Answer: C
C. Use the TensorFlow Extended SDK to implement the pipeline. Use the ExampleGen component with the BigQuery executor to ingest the data, the Transform component to preprocess the data, and the Trainer component to launch a Vertex AI training job.
upvoted 2 times
...
TanTran04
1 year, 4 months ago
Selected Answer: A
I go with A Kubeflow Pipelines SDK: supports machine learning and includes components specifically for tasks like data preprocessing, model training, and validation. BigQueryJobOp: enabling you to preprocess data using SQL scripts efficiently within BigQuery.
upvoted 1 times
...
SausageMuffins
1 year, 5 months ago
Selected Answer: C
Example Gen directly ingest data from BigQuery and the transform component makes it more efficient than using an input fn. I chose C over A and B because kubeflow pipelines is more sophisticated and requires more setup and effort because of it's customizability.
upvoted 3 times
...
gscharly
1 year, 6 months ago
Selected Answer: A
agree with guilhermebutzke
upvoted 1 times
...
pinimichele01
1 year, 7 months ago
Selected Answer: A
agree with guilhermebutzke
upvoted 1 times
...
Shark0
1 year, 7 months ago
Selected Answer: C
Given the requirement to minimize model development and training time while creating a training pipeline for a wide and deep model trained on datasets preprocessed using a SQL script in BigQuery, the most suitable option is: C. Use the TensorFlow Extended SDK to implement the pipeline. Use the ExampleGen component with the BigQuery executor to ingest the data, the Transform component to preprocess the data, and the Trainer component to launch a Vertex AI training job. This option leverages TensorFlow Extended (TFX), which is designed for scalable and production-ready machine learning pipelines. The ExampleGen component with the BigQuery executor efficiently ingests data from BigQuery. The Transform component applies preprocessing steps to the data, and the Trainer component launches a Vertex AI training job, minimizing the time and effort required for model development and training.
upvoted 2 times
...
Carlose2108
1 year, 8 months ago
Why not C?
upvoted 1 times
...
guilhermebutzke
1 year, 8 months ago
My Answer: A According with this documentation: https://cloud.google.com/vertex-ai/docs/tabular-data/tabular-workflows/overview A: CORRECT: BigQueryJobOp for running the existing preprocessing script that already resides there, CustomTrainingJobOp for launching custom training jobs on Vertex AI, which aligns with the requirement of using the pre-trained TensorFlow model. B: Not Correct: While DataflowPythonJobOp can be used for preprocessingthis increasing development time compared to the simpler BigQueryJobOp approach. C and D: Not Correct: While possible, using the TensorFlow Extended SDK with its components introduces unnecessary complexity for this specific scenario. For example, why use ExampleGen? Implementing preprocessing within the model's input_fn is generally not recommended due to potential efficiency drawbacks and training-serving skew issues.
upvoted 4 times
...
BlehMaks
1 year, 9 months ago
Selected Answer: A
D is wrong. Google doesn't recommend to use input_fn for preprocessing https://www.tensorflow.org/tfx/guide/tft_bestpractices#preprocessing_options_summary
upvoted 2 times
...
pikachu007
1 year, 10 months ago
Selected Answer: D
Addressing Limitations of Other Options: Kubeflow Pipelines (A and B): While Kubeflow offers flexibility, it might require more setup and configuration, potentially increasing development time compared to TFX's integrated approach. Separate Preprocessing (C): Using a separate Transform component for preprocessing can add complexity and potential overheads, especially for instance-level transformations that can often be directly integrated within the model's input pipeline.
upvoted 1 times
...

Topic 1 Question 209

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 209 discussion

You are training a custom language model for your company using a large dataset. You plan to use the Reduction Server strategy on Vertex AI. You need to configure the worker pools of the distributed training job. What should you do?

  • A. Configure the machines of the first two worker pools to have GPUs, and to use a container image where your training code runs. Configure the third worker pool to have GPUs, and use the reductionserver container image.
  • B. Configure the machines of the first two worker pools to have GPUs and to use a container image where your training code runs. Configure the third worker pool to use the reductionserver container image without accelerators, and choose a machine type that prioritizes bandwidth.
  • C. Configure the machines of the first two worker pools to have TPUs and to use a container image where your training code runs. Configure the third worker pool without accelerators, and use the reductionserver container image without accelerators, and choose a machine type that prioritizes bandwidth.
  • D. Configure the machines of the first two pools to have TPUs, and to use a container image where your training code runs. Configure the third pool to have TPUs, and use the reductionserver container image.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
OpenKnowledge
2 months, 1 week ago
Selected Answer: B
https://cloud.google.com/blog/products/ai-machine-learning/faster-distributed-training-with-google-clouds-reduction-server
upvoted 1 times
...
Pau1234
11 months ago
Selected Answer: B
Reduction server strategy: 1. Only GPUs 2. You do not use GPUs for the Reduction Server nodes. https://cloud.google.com/vertex-ai/docs/training/distributed-training
upvoted 2 times
...
lunalongo
11 months, 1 week ago
Selected Answer: B
B is the right answer because: - Reduction Server strategy is generally implemented with GPUs, not TPUs. - First 2 pools' replicas perform model training; need GPUs for faster processing - Container image should contain your custom training code. - 3rd pool contains reduction server, no GPU is needed here; prioritize network bandwidth instead!
upvoted 2 times
...
wences
1 year, 1 month ago
Selected Answer: B
The real reason for answer B is the custom model, which means it was not suited well for TPU
upvoted 1 times
...
fitri001
1 year, 6 months ago
Selected Answer: B
GPUs for Training: Configure the first two worker pools with GPUs to leverage the hardware acceleration capabilities for your custom language model training code. Reduction Server without GPUs: The third worker pool should use the reductionserver container image. This image is pre-configured for Reduction Server functionality and doesn't require GPUs. High-Bandwidth CPU: Choose a machine type with high bandwidth for the third pool since Reduction Server focuses on communication and gradient reduction.
upvoted 3 times
fitri001
1 year, 6 months ago
A. GPUs for Reduction Server: Reduction Server itself doesn't require or benefit from GPUs. It focuses on communication and reduction of gradients. It's better to use a CPU-based machine type for the third pool. C. TPUs instead of GPUs: While TPUs can be used for training some language models, Reduction Server specifically works with GPUs using the NCCL library. Configure your first two pools with GPUs for your training code. D. TPUs in Reduction Server pool: Similar to option A, Reduction Server doesn't benefit from TPUs. It's best to use a CPU with high bandwidth for the third pool.
upvoted 2 times
...
...
pinimichele01
1 year, 7 months ago
Selected Answer: B
https://cloud.google.com/blog/topics/developers-practitioners/optimize-training-performance-reduction-server-vertex-ai In this article, we introduce Reduction Server, a new Vertex AI feature that optimizes bandwidth and latency of multi-node distributed training on NVIDIA GPUs for synchronous data parallel algorithms.
upvoted 1 times
...
shadz10
1 year, 10 months ago
Selected Answer: B
TPUs are not supported for reductionserver so B
upvoted 3 times
...
winston9
1 year, 10 months ago
Selected Answer: B
bandwidth is important for the reduction server
upvoted 2 times
...
pikachu007
1 year, 10 months ago
Selected Answer: B
Worker Pools 1 and 2: These pools are responsible for the actual model training tasks. They require GPUs (or TPUs, if applicable to your model) to accelerate model computations. They run the container image containing your training code. Worker Pool 3: This pool is dedicated to the reduction server. It doesn't require accelerators (GPUs or TPUs) for gradient aggregation. Prioritize machines with high network bandwidth to optimize gradient exchange. Use the specific reductionserver
upvoted 3 times
...

Topic 1 Question 210

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 210 discussion

You have trained a model by using data that was preprocessed in a batch Dataflow pipeline. Your use case requires real-time inference. You want to ensure that the data preprocessing logic is applied consistently between training and serving. What should you do?

  • A. Perform data validation to ensure that the input data to the pipeline is the same format as the input data to the endpoint.
  • B. Refactor the transformation code in the batch data pipeline so that it can be used outside of the pipeline. Use the same code in the endpoint.
  • C. Refactor the transformation code in the batch data pipeline so that it can be used outside of the pipeline. Share this code with the end users of the endpoint.
  • D. Batch the real-time requests by using a time window and then use the Dataflow pipeline to preprocess the batched requests. Send the preprocessed requests to the endpoint.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
fitri001
1 year ago
Selected Answer: B
Refactored Transformation Code: By refactoring the transformation code from the batch pipeline, you can create a reusable module that performs the same preprocessing steps. Same Code in Endpoint: Utilize the refactored code within your real-time inference endpoint. This ensures the data is preprocessed identically to how it was preprocessed during training.
upvoted 2 times
fitri001
1 year ago
A. Data Validation: While data validation is important, it doesn't guarantee consistent preprocessing logic. You need to ensure the same transformations are applied. C. Share Code with End Users: Sharing code with end-users might not be ideal, especially if it requires specific libraries or configurations for execution outside of the pipeline. D. Batching and Dataflow: Batching real-time requests for Dataflow processing might introduce latency and defeat the purpose of real-time inference.
upvoted 3 times
...
...
pinimichele01
1 year ago
Selected Answer: B
agree with guilhermebutzke
upvoted 1 times
...
guilhermebutzke
1 year, 2 months ago
Selected Answer: B
My Answer B: B. This option ensures that the preprocessing logic used during training, which has already been validated and tested, is applied consistently during real-time inference. By making the transformation code reusable outside of the batch pipeline and utilizing it in the endpoint, you ensure that the same preprocessing steps are applied to incoming data during inference, thus maintaining consistency between training and serving. A:  While data validation is essential, it only ensures the format. It doesn't guarantee consistent preprocessing logic between training and serving. C: Sharing code with end-users might not be desirable for security or maintainability reasons. D: Batching introduces latency and might not be suitable for real-time needs. Additionally, using the entire Dataflow pipeline might be inefficient for individual requests.
upvoted 3 times
...
shadz10
1 year, 3 months ago
Selected Answer: B
The transformation logic code in the serving_fn function defines the serving interface of your SavedModel for online prediction. If you implement the same transformations that were used for preparing training data in the transformation logic code of the serving_fn function, it ensures that the same transformations are applied to new prediction data points when they're served. https://www.tensorflow.org/tfx/guide/tft_bestpractices
upvoted 3 times
...
pikachu007
1 year, 4 months ago
Selected Answer: B
A. Data validation: While essential, it doesn't guarantee consistency if the preprocessing logic itself differs between pipeline and endpoint. C. Sharing code with end users: This shifts the preprocessing burden to end users, potentially leading to inconsistencies and errors, and isn't feasible for real-time inference. D. Batching real-time requests: This introduces latency and might not align with real-time requirements, as users expect immediate responses.
upvoted 1 times
...

Topic 1 Question 211

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 211 discussion

You need to develop a custom TensorFlow model that will be used for online predictions. The training data is stored in BigQuery You need to apply instance-level data transformations to the data for model training and serving. You want to use the same preprocessing routine during model training and serving. How should you configure the preprocessing routine?

  • A. Create a BigQuery script to preprocess the data, and write the result to another BigQuery table.
  • B. Create a pipeline in Vertex AI Pipelines to read the data from BigQuery and preprocess it using a custom preprocessing component.
  • C. Create a preprocessing function that reads and transforms the data from BigQuery. Create a Vertex AI custom prediction routine that calls the preprocessing function at serving time.
  • D. Create an Apache Beam pipeline to read the data from BigQuery and preprocess it by using TensorFlow Transform and Dataflow.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
guilhermebutzke
Highly Voted 1 year, 2 months ago
Selected Answer: D
My answer: D According to this documentation, it is very clear that using BigQuery is not a good approach for online prediction at the instance level. That's because we won't use the same code for both training and prediction serving. In the same documentation, the final table on the page recommends using Dataflow with TensorFlow Transform for instance-level data transformation. https://www.tensorflow.org/tfx/guide/tft_bestpractices
upvoted 8 times
...
OpenKnowledge
Most Recent 2 months, 1 week ago
Selected Answer: D
Apache Beam can be used to perform instance-level data transformations for both training and serving in machine learning workflows. This involves applying transformations to individual data points or records as they flow through the Beam pipeline. The key to successful serving is to apply the exact same preprocessing steps to incoming inference requests as were applied to the training data. This ensures that the model receives data in the format it was trained on. Apache Beam pipelines can be designed to perform these transformations on individual inference requests in real-time or near real-time. By leveraging Apache Beam for both training and serving, you can ensure consistency in data transformations, which is crucial for model performance and reliability. The ability to perform these transformations at the instance level allows for flexible and scalable data processing in ML pipelines.
upvoted 1 times
...
pinimichele01
1 year ago
Selected Answer: D
https://www.tensorflow.org/tfx/guide/tft_bestpractices#preprocessing_options_summary
upvoted 1 times
...
Yan_X
1 year, 3 months ago
Selected Answer: D
D - Apache Beam + tf.transform or Dataflow. https://notebook.community/GoogleCloudPlatform/training-data-analyst/courses/machine_learning/deepdive/04_advanced_preprocessing/a_dataflow
upvoted 2 times
...
BlehMaks
1 year, 3 months ago
Selected Answer: A
the simplest way
upvoted 1 times
...
shadz10
1 year, 3 months ago
Selected Answer: D
D- Vertex AI isn't designed for instance-level data transformations
upvoted 1 times
shadz10
1 year, 3 months ago
This document also provides an overview of TensorFlow Transform (tf.Transform), a library for TensorFlow that lets you define both instance-level and full-pass data transformation through data preprocessing pipelines. These pipelines are executed with Apache Beam, and they create artifacts that let you apply the same transformations during prediction as when the model is served. https://www.tensorflow.org/tfx/guide/tft_bestpractices
upvoted 1 times
...
...
shadz10
1 year, 3 months ago
D- Vertex AI isn't designed for instance-level data transformations
upvoted 3 times
...
pikachu007
1 year, 4 months ago
Selected Answer: C
Addressing limitations of other options: A. Data validation: While essential, it doesn't guarantee consistency if the preprocessing logic itself differs between pipeline and endpoint. C. Sharing code with end users: This shifts the preprocessing burden to end users, potentially leading to inconsistencies and errors, and isn't feasible for real-time inference. D. Batching real-time requests: This introduces latency and might not align with real-time requirements, as users expect immediate responses.
upvoted 2 times
...

Topic 1 Question 212

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 212 discussion

You are pre-training a large language model on Google Cloud. This model includes custom TensorFlow operations in the training loop. Model training will use a large batch size, and you expect training to take several weeks. You need to configure a training architecture that minimizes both training time and compute costs. What should you do?

  • A. Implement 8 workers of a2-megagpu-16g machines by using tf.distribute.MultiWorkerMirroredStrategy.
  • B. Implement a TPU Pod slice with -accelerator-type=v4-l28 by using tf.distribute.TPUStrategy.
  • C. Implement 16 workers of c2d-highcpu-32 machines by using tf.distribute.MirroredStrategy.
  • D. Implement 16 workers of a2-highgpu-8g machines by using tf.distribute.MultiWorkerMirroredStrategy.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
pikachu007
Highly Voted 1 year, 10 months ago
Selected Answer: B
TPU Advantages: Highly Specialized: TPUs (Tensor Processing Units) are custom-designed hardware accelerators specifically optimized for machine learning workloads, particularly those involving large batch sizes and matrix-heavy computations, common in large language models. Exceptional Performance: TPUs can significantly outperform CPUs and GPUs in terms of speed and efficiency for these types of tasks. Cost-Effective: While TPUs might have a higher hourly cost, their exceptional performance often leads to lower overall costs due to faster training times and reduced resource usage. TPU Pod Slice: Scalability: TPU Pod slices allow you to distribute training across multiple TPUv4 chips for even greater performance and scalability. Custom Operations: The tf.distribute.TPUStrategy ensures compatibility with custom TensorFlow operations,
upvoted 10 times
...
AK2020
Highly Voted 1 year, 3 months ago
Selected Answer: A
B is not correct as TPUs not suitable for TensorFlow custom operations and C doesn't make any sense. A or D?. I would go with A
upvoted 5 times
...
Fer660
Most Recent 2 months ago
Selected Answer: A
Not B: TPU does not support custom TF operations in the main training loop. https://cloud.google.com/tpu/docs/intro-to-tpu Not C: clearly not a CPU setup. I am torn between A and D. splitting across more machines introduces more overhead, so I am a bit inclined for A. I guess one would have to work out the dollar costs of the machine types in A and D
upvoted 1 times
...
NamitSehgal
8 months, 4 weeks ago
Answer is B designed and highly optimized for the type of large matrix multiplications and computations involved in training large language models
upvoted 1 times
...
Omi_04040
11 months ago
Selected Answer: A
The question says "model includes custom TensorFlow operations in the training loop", this is not supported by TPU. Hence A
upvoted 4 times
...
Pau1234
11 months ago
Selected Answer: D
TPUs are not suitable since we are talking about customer operations. Then between A and D. I'd go with D, because it is more cost effective than A. 16g will be more expensive.
upvoted 1 times
...
9fbd29a
11 months, 2 weeks ago
Selected Answer: A
TPUs not recommended for custom operations
upvoted 3 times
...
DaleR
11 months, 3 weeks ago
B is wrong:
upvoted 2 times
...
f084277
12 months ago
All the people voting B are wrong. TPUs cannot be used with TF custom operations
upvoted 4 times
...
baimus
1 year, 2 months ago
Selected Answer: A
This could be A or D, because they both will perform will with custom Tensorflow operations. A is likely to be better with large batch sizes, which require bigger GPUs, so I went A.
upvoted 3 times
...
info_appsatori
1 year, 4 months ago
Should be A or D. TPU is ok, but TPUs not suitable for TensorFlow custom operations.
upvoted 2 times
...
ccb23cc
1 year, 5 months ago
Selected Answer: A
B. TPU Acceleration: the question says that uses Tensorflow custom operations in the main loop and Google documentation literatelly says about TPU use: "Models with no custom TensorFlow/PyTorch/JAX operations inside the main training loop" C. High-CPU Machines: Make no sense because tell you to use a cpu (which does not help us in this case) So the correct answer is between A and D. However the question says that they are planning to use a large batch size so we need RAM. Therefore we should take the one with more. Correct answer: Option A
upvoted 4 times
...
fitri001
1 year, 6 months ago
Selected Answer: B
TPU Acceleration: TPUs are specifically designed for machine learning workloads and offer significant speedups compared to GPUs or CPUs, especially for large models like yours. Utilizing a TPU Pod slice provides access to a collection of interconnected TPUs for efficient parallel training. tf.distribute.TPUStrategy: This strategy is specifically designed to work with TPUs in TensorFlow. It handles data distribution, model replication, and gradient aggregation across the TPU cores, enabling efficient training with custom TensorFlow operations.
upvoted 3 times
fitri001
1 year, 6 months ago
why not the others? A. MultiWorkerMirroredStrategy with GPUs: While GPUs offer some acceleration, TPUs are generally better suited for large language model pre-training due to their architectural optimizations. Additionally, managing 8 workers across separate machines can introduce communication overhead compared to a tightly coupled TPU Pod. C. MirroredStrategy with High-CPU Machines: CPU-based training would be significantly slower than TPUs or even GPUs for a large language model. While the high CPU count might seem beneficial for custom operations, the overall training speed would still be limited. D. MultiWorkerMirroredStrategy with Multiple High-GPU Machines: Similar to option A, using multiple high-GPU machines with this strategy would incur communication overhead and potentially be less cost-effective compared to a single TPU Pod slice.
upvoted 3 times
...
...
BlehMaks
1 year, 10 months ago
Selected Answer: B
It should be TPU but i'm a bit concerned about this point from Google documentation: Models with no custom TensorFlow/PyTorch/JAX operations inside the main training loop https://cloud.google.com/tpu/docs/intro-to-tpu#TPU
upvoted 3 times
...
b1a8fae
1 year, 10 months ago
Selected Answer: B
B. NGL quite lost on this one but if the training set is big enough to span over several weeks I would go with the most powerful resource (TPUs) but I might be completely wrong.
upvoted 4 times
...

Topic 1 Question 213

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 213 discussion

You are building a TensorFlow text-to-image generative model by using a dataset that contains billions of images with their respective captions. You want to create a low maintenance, automated workflow that reads the data from a Cloud Storage bucket collects statistics, splits the dataset into training/validation/test datasets performs data transformations trains the model using the training/validation datasets, and validates the model by using the test dataset. What should you do?

  • A. Use the Apache Airflow SDK to create multiple operators that use Dataflow and Vertex AI services. Deploy the workflow on Cloud Composer.
  • B. Use the MLFlow SDK and deploy it on a Google Kubernetes Engine cluster. Create multiple components that use Dataflow and Vertex AI services.
  • C. Use the Kubeflow Pipelines (KFP) SDK to create multiple components that use Dataflow and Vertex AI services. Deploy the workflow on Vertex AI Pipelines.
  • D. Use the TensorFlow Extended (TFX) SDK to create multiple components that use Dataflow and Vertex AI services. Deploy the workflow on Vertex AI Pipelines.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
pinimichele01
Highly Voted 1 year, 7 months ago
Selected Answer: C
If you use TensorFlow in an ML workflow that processes terabytes of structured data or text data, we recommend that you build your pipeline using TFX. For other use cases, we recommend that you build your pipeline using the Kubeflow Pipelines SDK https://cloud.google.com/vertex-ai/docs/pipelines/build-pipeline#sdk
upvoted 6 times
...
alja12
Most Recent 4 months, 2 weeks ago
Selected Answer: D
I would vote for D. If you read the question carefully you will see that you're processing billions of images with captions (TEXT+ images). So we have to preprocess text. Plus, you want to build a TensorFlow text-to-image generative model. For me it points to the option D. Although at the very beginning C was my primary choice.
upvoted 2 times
...
kornick
10 months ago
Selected Answer: C
TFX -> processes terabytes of structured data or text data
upvoted 1 times
...
wences
1 year, 1 month ago
Selected Answer: D
in this one will go with D, TFX is more specialized than kfp
upvoted 2 times
...
baimus
1 year, 2 months ago
Selected Answer: D
TFX is going to be easier than kubeflow with custom code, as it basically does exactly what is listed there, by default.
upvoted 3 times
...
dija123
1 year, 4 months ago
Selected Answer: D
Agree with TFX
upvoted 2 times
...
PhilipKoku
1 year, 5 months ago
Selected Answer: D
D) TFX is the way forward as it has services to support every step of the use case presented.
upvoted 3 times
...
fitri001
1 year, 6 months ago
Selected Answer: C
KFP Pipelines: Kubeflow Pipelines (KFP) is a popular open-source framework for building and deploying machine learning workflows. It provides a user-friendly SDK for defining pipelines as components and simplifies workflow orchestration. Vertex AI Pipelines Integration: Vertex AI Pipelines is a managed service from Google Cloud that integrates seamlessly with KFP. You can deploy your KFP-defined workflow on Vertex AI Pipelines, leveraging its features like scheduling, monitoring, and versioning. Dataflow and Vertex AI Services: Both Dataflow and Vertex AI are Google Cloud services well-suited for this workflow
upvoted 2 times
fitri001
1 year, 6 months ago
why not others? A. Airflow with Dataflow and Vertex AI: While Airflow is a powerful workflow management tool, deploying it on Cloud Composer adds additional complexity compared to the managed environment of Vertex AI Pipelines. B. MLflow with Dataflow and Vertex AI: MLflow focuses primarily on model lifecycle management. While it can be used for building pipelines, KFP offers a more specialized and user-friendly approach for this specific use case. D. TFX with Dataflow and Vertex AI: TFX is a comprehensive end-to-end ML platform. While it offers several functionalities, it might be an overkill for this scenario focusing on data processing, training, and validation. KFP provides a simpler solution for this specific workflow.
upvoted 3 times
...
...
winston9
1 year, 9 months ago
Selected Answer: D
C and D are valid options. if the model is created in TF, use TFX, in any other case, use KFP; therefore, here is D
upvoted 2 times
pinimichele01
1 year, 7 months ago
https://cloud.google.com/vertex-ai/docs/pipelines/build-pipeline#sdk
upvoted 1 times
...
...
BlehMaks
1 year, 10 months ago
Selected Answer: C
https://cloud.google.com/vertex-ai/docs/pipelines/build-pipeline#sdk
upvoted 3 times
...
pikachu007
1 year, 10 months ago
Selected Answer: D
Airflow (A): While versatile, Airflow often requires more manual configuration and integration with ML services, potentially increasing maintenance effort. MLFlow (B): MLFlow focuses on experiment tracking and model management, lacking built-in pipeline components for data processing and model training. Kubeflow Pipelines (C): KFP is flexible but requires more setup and infrastructure management compared to TFX's managed services.
upvoted 3 times
pinimichele01
1 year, 7 months ago
https://cloud.google.com/vertex-ai/docs/pipelines/build-pipeline#sdk
upvoted 1 times
...
...

Topic 1 Question 214

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 214 discussion

You are developing an ML pipeline using Vertex AI Pipelines. You want your pipeline to upload a new version of the XGBoost model to Vertex AI Model Registry and deploy it to Vertex AI Endpoints for online inference. You want to use the simplest approach. What should you do?

  • A. Use the Vertex AI REST API within a custom component based on a vertex-ai/prediction/xgboost-cpu image
  • B. Use the Vertex AI ModelEvaluationOp component to evaluate the model
  • C. Use the Vertex AI SDK for Python within a custom component based on a python:3.10 image
  • D. Chain the Vertex AI ModelUploadOp and ModelDeployOp components together
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
fitri001
1 year ago
Selected Answer: D
Built-in Functionality: Both ModelUploadOp and ModelDeployOp are pre-built components within Vertex AI Pipelines specifically designed for uploading models and deploying them to endpoints. Ease of Use: These components offer a user-friendly interface within the pipeline definition. You only need to specify essential details like the model path, container image URI (pre-built for XGBoost is available), endpoint configuration, etc. Reduced Code Complexity: Using these components eliminates the need for writing custom code within your pipeline for model upload and deployment, simplifying your pipeline logic.
upvoted 3 times
fitri001
1 year ago
why not the others? A. Custom Component with Vertex AI REST API: While this approach provides flexibility, it requires writing custom code to interact with the Vertex AI REST API within a container image. This adds complexity compared to using pre-built components. B. ModelEvaluationOp: This component is designed for model evaluation within the pipeline, not for uploading or deploying models. C. Custom Component with Python SDK: Similar to option A, using the Python SDK within a custom component offers flexibility but requires writing more code compared to using the pre-built ModelUploadOp and ModelDeployOp components.
upvoted 2 times
...
...
pinimichele01
1 year, 1 month ago
Selected Answer: D
https://cloud.google.com/vertex-ai/docs/pipelines/model-endpoint-component
upvoted 2 times
...
shadz10
1 year, 3 months ago
Selected Answer: D
https://cloud.google.com/vertex-ai/docs/pipelines/model-endpoint-component
upvoted 2 times
...
pikachu007
1 year, 4 months ago
Selected Answer: D
A. Custom Component with REST API: This involves more manual coding and understanding of REST API endpoints, potentially increasing complexity and maintenance. B. ModelEvaluationOp: This component is primarily for model evaluation, not model upload and deployment. C. Custom Component with SDK: While feasible, it involves more setup and dependency management compared to using built-in components.
upvoted 1 times
...

Topic 1 Question 215

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 215 discussion

You work for an online retailer. Your company has a few thousand short lifecycle products. Your company has five years of sales data stored in BigQuery. You have been asked to build a model that will make monthly sales predictions for each product. You want to use a solution that can be implemented quickly with minimal effort. What should you do?

  • A. Use Prophet on Vertex AI Training to build a custom model.
  • B. Use Vertex AI Forecast to build a NN-based model.
  • C. Use BigQuery ML to build a statistical ARIMA_PLUS model.
  • D. Use TensorFlow on Vertex AI Training to build a custom model.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
pikachu007
Highly Voted 1 year, 4 months ago
Selected Answer: C
Ease of Use: BigQuery ML integrates seamlessly with BigQuery, allowing you to create and train models directly within SQL queries, eliminating the need for separate environments or coding. Statistical ARIMA_PLUS Strengths: This model is well-suited for time series forecasting, automatically handling seasonality, trends, and holidays, making it appropriate for monthly sales predictions. Minimal Effort: BigQuery ML handles model training and tuning, reducing the need for manual configuration or hyperparameter tuning. Fast Implementation: Model creation and training can be done in a few lines of SQL, enabling rapid deployment.
upvoted 5 times
...
fitri001
Most Recent 1 year ago
Selected Answer: C
Quick Implementation: BigQuery ML simplifies the process. You can train and deploy the model directly within BigQuery, eliminating the need for complex model deployment or data movement. Minimal Effort: ARIMA_PLUS is a pre-built statistical model available in BigQuery ML. You don't need to write custom code for a complex neural network (NN) model like in option B or D. Time Series Data: ARIMA models are well-suited for time series forecasting, which is ideal for your monthly sales prediction task.
upvoted 3 times
fitri001
1 year ago
why not others? A. Prophet on Vertex AI Training: While Prophet is a good choice for time series forecasting with holidays and seasonality, using Vertex AI Training requires additional setup and potentially custom code compared to the readily available ARIMA_PLUS model within BigQuery ML. B. Vertex AI Forecast with NN-based Model: Building a custom NN-based model using Vertex AI Forecast offers flexibility but requires more effort and expertise in model development and potentially hyperparameter tuning. This might not be ideal for a quick implementation. D. TensorFlow on Vertex AI Training: Similar to option B, using TensorFlow for a custom model offers flexibility but requires significant coding and expertise, making it less suitable for a quick and low-effort approach.
upvoted 2 times
...
...
pinimichele01
1 year, 1 month ago
Selected Answer: C
data on bigquery + minimal effort -> C
upvoted 1 times
...
b1a8fae
1 year, 3 months ago
Selected Answer: C
Given amount of data (few thousand short-cycled products) and frequency of predictions (monthly) C is the way to go.
upvoted 1 times
...

Topic 1 Question 216

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 216 discussion

You are creating a model training pipeline to predict sentiment scores from text-based product reviews. You want to have control over how the model parameters are tuned, and you will deploy the model to an endpoint after it has been trained. You will use Vertex AI Pipelines to run the pipeline. You need to decide which Google Cloud pipeline components to use. What components should you choose?

  • A. TabularDatasetCreateOp, CustomTrainingJobOp, and EndpointCreateOp
  • B. TextDatasetCreateOp, AutoMLTextTrainingOp, and EndpointCreateOp
  • C. TabularDatasetCreateOp. AutoMLTextTrainingOp, and ModelDeployOp
  • D. TextDatasetCreateOp, CustomTrainingJobOp, and ModelDeployOp
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
VinaoSilva
1 year, 4 months ago
Selected Answer: D
"Text dataset -> TextDatasetCreateOp Control over parameters -> CustomTrainingJobOp"
upvoted 2 times
...
fitri001
1 year, 6 months ago
Selected Answer: D
TextDatasetCreateOp: This component is specifically designed to handle text-based data like product reviews. It reads and prepares the text data for training the model. CustomTrainingJobOp: Since you want control over hyperparameter tuning, a custom training job is the most suitable option. This component allows you to define your training script using a framework like TensorFlow and configure hyperparameters for optimization. ModelDeployOp: After training, this component uploads the trained model to the Vertex AI Model Registry and deploys it to a Vertex AI Endpoint for serving predictions.
upvoted 3 times
fitri001
1 year, 6 months ago
why not others? A. TabularDatasetCreateOp and EndpointCreateOp: TabularDatasetCreateOp is designed for tabular data, not raw text. EndpointCreateOp creates an endpoint, but you need a model upload step before deployment (handled by ModelDeployOp). B. AutoMLTextTrainingOp: While AutoML offers convenience, it removes control over hyperparameter tuning, which you require. C. TabularDatasetCreateOp and AutoMLTextTrainingOp: Similar to option A, TabularDatasetCreateOp is not ideal for text data, and AutoML removes hyperparameter control.
upvoted 3 times
...
...
pinimichele01
1 year, 7 months ago
Selected Answer: D
D fits perfect
upvoted 1 times
...
vaibavi
1 year, 9 months ago
Selected Answer: D
D AutoML uses a predefined set of hyperparameter values for each algorithm used in model training. We can not have a control over hyperparameter
upvoted 2 times
...
b1a8fae
1 year, 10 months ago
Selected Answer: D
Text dataset -> TextDatasetCreateOp Control over parameters -> CustomTrainingJobOp
upvoted 3 times
...
pikachu007
1 year, 10 months ago
Selected Answer: D
TextDatasetCreateOp: This component is specifically designed to create datasets from text-based data, essential for handling product reviews. CustomTrainingJobOp: This component provides full control over the training process, allowing you to specify model architecture, hyperparameter tuning strategies, and other training parameters, aligning with the requirement for control over model tuning. ModelDeployOp: This component streamlines model deployment to a Vertex AI endpoint for real-time or batch inference, enabling the trained model to serve predictions.
upvoted 1 times
...

Topic 1 Question 217

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 217 discussion

Your team frequently creates new ML models and runs experiments. Your team pushes code to a single repository hosted on Cloud Source Repositories. You want to create a continuous integration pipeline that automatically retrains the models whenever there is any modification of the code. What should be your first step to set up the CI pipeline?

  • A. Configure a Cloud Build trigger with the event set as "Pull Request"
  • B. Configure a Cloud Build trigger with the event set as "Push to a branch"
  • C. Configure a Cloud Function that builds the repository each time there is a code change
  • D. Configure a Cloud Function that builds the repository each time a new branch is created
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
OpenKnowledge
2 months, 1 week ago
Selected Answer: B
Pull Request trigger allows for automated checks (e.g., unit tests, linting, security scans) on the code before it's merged into a main or protected branch. When a pull request is opened or updated, a Cloud Build trigger can initiate builds and tests on the proposed changes within the feature branch. Push to Branch triggers are typically used for deploying validated code to environments (e.g., development, staging, production) after it has been reviewed and merged. It is used for post-merge actions and continuous deployment. When changes are pushed directly to a specific branch (e.g., main, dev, prod), a Cloud Build trigger can initiate builds, deployments, or other actions.
upvoted 1 times
...
forport
1 year, 3 months ago
Selected Answer: B
B. According to Gemini-Advanced.
upvoted 1 times
...
fitri001
1 year, 6 months ago
Selected Answer: B
Continuous Integration: CI pipelines aim for frequent integration of code changes. Triggering the build pipeline upon every push to a branch (including the main branch) ensures your models retrain whenever the code relevant to them is modified. Focus on Relevant Changes: Compared to option A ("Pull Request"), triggering on pushes allows retraining even for direct pushes to the main branch, not just pull request merges. This can be crucial for catching critical code changes that might bypass pull requests.
upvoted 3 times
fitri001
1 year, 6 months ago
C. Cloud Function for Code Changes: While Cloud Functions can be used for CI pipelines, manually configuring a function for every code change might become cumbersome and less scalable compared to a dedicated CI/CD service like Cloud Build with built-in triggering functionalities. D. Cloud Function for New Branches: Triggering on new branch creation alone wouldn't retrain models on existing branches where your team actively works. You'd need an additional trigger for existing branches (e.g., push to branch) to achieve automatic retraining.
upvoted 1 times
...
...
pinimichele01
1 year, 7 months ago
Selected Answer: B
For ANY modifications, “Push to a branch” is the best choice in Cloud Build trigger.
upvoted 2 times
...
guilhermebutzke
1 year, 9 months ago
Selected Answer: B
My Answer B: For ANY modifications, “Push to a branch” is the best choice in Cloud Build trigger. However, when it comes to ML model training, retraining models on every push might be resource-intensive, especially if the training process is computationally expensive. So, I think triggering the CI pipeline on a pull request allows for changes to be tested before merging into the main branch. would be a better choice …
upvoted 1 times
...
b1a8fae
1 year, 10 months ago
Selected Answer: B
B. Any code change on the Cloud repo is done by pushing to a branch.
upvoted 1 times
...
pikachu007
1 year, 10 months ago
Selected Answer: B
Cloud Build Integration: Cloud Build is Google Cloud's fully managed CI/CD platform, designed to automate builds and deployments, making it ideal for this task. Trigger on Code Pushes: Setting the trigger event to "Push to a branch" ensures that the pipeline automatically activates whenever new code is pushed to any branch of the repository, aligning with the goal of retraining models on code modifications.
upvoted 4 times
...

Topic 1 Question 218

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 218 discussion

You have built a custom model that performs several memory-intensive preprocessing tasks before it makes a prediction. You deployed the model to a Vertex AI endpoint, and validated that results were received in a reasonable amount of time. After routing user traffic to the endpoint, you discover that the endpoint does not autoscale as expected when receiving multiple requests. What should you do?

  • A. Use a machine type with more memory
  • B. Decrease the number of workers per machine
  • C. Increase the CPU utilization target in the autoscaling configurations.
  • D. Decrease the CPU utilization target in the autoscaling configurations
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
b1a8fae
Highly Voted 1 year, 10 months ago
Selected Answer: D
D. The idea behind this question is getting autoscaling to handle well the fluctuating input of requests. Changing the machine (A) is not related to autoscaling, and you might not be using the full potential of the machine during the whole time, bur rather only during instances of peak traffic. You need to lower the autoscaling threshold (the target utilization metric mentioned in the options is CPU, so we will go with this) so you make use of more resources whenever too many memory-intensive requests are happening. https://cloud.google.com/compute/docs/autoscaler/scaling-cpu#scaling_based_on_cpu_utilization https://cloud.google.com/compute/docs/autoscaler#autoscaling_policy
upvoted 11 times
b1a8fae
1 year, 10 months ago
Addition: although memory-intensive is not directly related to CPU, for me the key is "the model does not autoscale as expected". To me this is addressing directly the settings of autoscaling, which won't change by changing the machine.
upvoted 2 times
...
...
pikachu007
Highly Voted 1 year, 10 months ago
Selected Answer: A
B. Decreasing Workers: This might reduce memory usage per machine but could also decrease overall throughput, potentially impacting performance. C. Increasing CPU Utilization Target: This wouldn't directly address the memory bottleneck and could trigger unnecessary scaling based on CPU usage, not memory requirements. D. Decreasing CPU Utilization Target: This could lead to premature scaling, potentially increasing costs without addressing the root cause.
upvoted 6 times
...
VinaoSilva
Most Recent 1 year, 4 months ago
Selected Answer: D
"use autoscale" = deacrease cpu utilization target
upvoted 2 times
...
fitri001
1 year, 6 months ago
Selected Answer: D
D. Decrease the CPU utilization target: This is the most suitable approach. By lowering the CPU utilization target, the endpoint will scale up at a lower CPU usage level. This increases the likelihood of scaling up when the memory-intensive preprocessing tasks cause a rise in CPU utilization, even though memory is the root cause.
upvoted 3 times
fitri001
1 year, 6 months ago
A. Use a machine type with more memory: While this might seem logical, autoscaling in Vertex AI endpoints relies on CPU utilization as the metric, not directly on memory usage. Even with more memory, the endpoint might not scale up if CPU utilization remains below the threshold. B. Decrease the number of workers per machine (Not applicable to Vertex AI Endpoints): This option might be relevant for some serving frameworks, but Vertex AI Endpoints don't typically use a worker concept. Scaling down workers wouldn't directly address the memory bottleneck. C. Increase the CPU utilization target: This would instruct the endpoint to scale up only when CPU usage reaches a higher threshold. Since the issue is memory usage, increasing the CPU target wouldn't trigger scaling when memory is the limiting factor.
upvoted 2 times
...
...
guilhermebutzke
1 year, 9 months ago
Selected Answer: D
Option D, "Decrease the CPU utilization target in the autoscaling configurations," could be a valid approach to address the issue of autoscaling and anticipate spikes in traffic. By lowering the threshold, the autoscaling system would initiate scaling actions at a lower CPU utilization level, allowing for a more proactive response to increasing demands.
upvoted 3 times
...

Topic 1 Question 219

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 219 discussion

Your company manages an ecommerce website. You developed an ML model that recommends additional products to users in near real time based on items currently in the user’s cart. The workflow will include the following processes:

1. The website will send a Pub/Sub message with the relevant data and then receive a message with the prediction from Pub/Sub
2. Predictions will be stored in BigQuery
3. The model will be stored in a Cloud Storage bucket and will be updated frequently

You want to minimize prediction latency and the effort required to update the model. How should you reconfigure the architecture?

  • A. Write a Cloud Function that loads the model into memory for prediction. Configure the function to be triggered when messages are sent to Pub/Sub.
  • B. Create a pipeline in Vertex AI Pipelines that performs preprocessing, prediction, and postprocessing. Configure the pipeline to be triggered by a Cloud Function when messages are sent to Pub/Sub.
  • C. Expose the model as a Vertex AI endpoint. Write a custom DoFn in a Dataflow job that calls the endpoint for prediction.
  • D. Use the RunInference API with WatchFilePattern in a Dataflow job that wraps around the model and serves predictions.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
guilhermebutzke
Highly Voted 1 year, 9 months ago
Selected Answer: D
My answer: D This Google Documentation explains “Instead of deploying the model to an endpoint, you can use the RunInference API to serve machine learning models in your Apache Beam pipeline. This approach has several advantages, including flexibility and portability.” https://cloud.google.com/blog/products/ai-machine-learning/streaming-prediction-with-dataflow-and-vertex This documentation uses RunInference and WatchFilePattern to “to automatically update the ML model without stopping the Apache Beam”. https://cloud.google.com/dataflow/docs/notebooks/automatic_model_refresh So, thinking in “minimize prediction latency”, its suggested use RunInfenrece, while “effort required to update the model” the **WatchFilePattern is the best approach.** I think D is the best option
upvoted 7 times
...
4d742d7
Most Recent 5 months ago
Selected Answer: C
Vertex AI endpoints are built for online prediction and automatically serve updated models when deployed. You don’t need to reload the model in memory manually. Dataflow is scalable, stream-friendly, and integrates well with Pub/Sub for input and BigQuery for output. Writing a custom DoFn in Dataflow gives you flexibility to call the endpoint using the prediction API and process results efficiently.
upvoted 3 times
...
phani49
10 months, 3 weeks ago
Selected Answer: D
Exposing the model as a Vertex AI endpoint and using Dataflow with a custom DoFn provides the optimal solution for real-time predictions with minimal latency. https://cloud.google.com/blog/products/ai-machine-learning/streaming-prediction-with-dataflow-and-vertex
upvoted 1 times
...
lunalongo
11 months, 1 week ago
Selected Answer: A
A is the best option because: - Minimizes Latency: Loading the model into the Cloud Function's memory eliminates the overhead of loading the model from storage for each prediction request. This significantly reduces latency, crucial for near real-time recommendations. The function is triggered directly by Pub/Sub messages, further streamlining the process. - Simplified Model Updates: Updating the model involves simply deploying a new version of the Cloud Function with the updated model. This is a much simpler process than managing pipelines or endpoints. D is the most voted so far, but... The complexity of managing the Dataflow pipeline and the potential latency introduced by the pipeline outweigh the benefits of automatic model updates using WatchFilePattern in this context. Therefore, option A (Cloud Function) remains the most efficient solution.
upvoted 3 times
...
PhilipKoku
1 year, 5 months ago
Selected Answer: C
C) Expose the model as Vertex AI End Point
upvoted 1 times
...
pinimichele01
1 year, 6 months ago
Selected Answer: D
agree with guilhermebutzke
upvoted 1 times
...
Yan_X
1 year, 8 months ago
Selected Answer: A
A for me.
upvoted 1 times
...
ddogg
1 year, 9 months ago
Selected Answer: D
Automatic Model Updates: WatchFilePattern automatically detects model changes in Cloud Storage, leading to seamless updates without managing endpoint deployments.
upvoted 4 times
...
pikachu007
1 year, 10 months ago
Selected Answer: A
Low Latency: Serverless Execution: Cloud Functions start up almost instantly, reducing prediction latency compared to alternatives that require longer setup or deployment times. In-Memory Model: Loading the model into memory eliminates disk I/O overhead, further contributing to rapid predictions.
upvoted 2 times
CHARLIE2108
1 year, 9 months ago
Cloud Functions offer low latency but it might not scale well.
upvoted 2 times
...
...

Topic 1 Question 220

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 220 discussion

You are collaborating on a model prototype with your team. You need to create a Vertex AI Workbench environment for the members of your team and also limit access to other employees in your project. What should you do?

  • A. 1. Create a new service account and grant it the Notebook Viewer role
    2. Grant the Service Account User role to each team member on the service account
    3. Grant the Vertex AI User role to each team member
    4. Provision a Vertex AI Workbench user-managed notebook instance that uses the new service account
  • B. 1. Grant the Vertex AI User role to the default Compute Engine service account
    2. Grant the Service Account User role to each team member on the default Compute Engine service account
    3. Provision a Vertex AI Workbench user-managed notebook instance that uses the default Compute Engine service account.
  • C. 1. Create a new service account and grant it the Vertex AI User role
    2. Grant the Service Account User role to each team member on the service account
    3. Grant the Notebook Viewer role to each team member.
    4. Provision a Vertex AI Workbench user-managed notebook instance that uses the new service account
  • D. 1. Grant the Vertex AI User role to the primary team member
    2. Grant the Notebook Viewer role to the other team members
    3. Provision a Vertex AI Workbench user-managed notebook instance that uses the primary user’s account
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
b7ad1d9
1 month, 2 weeks ago
Selected Answer: A
How is C correct? The users never get the Vertex AI User role to access any resources! Only the service account gets it! Users get the Notebook Viewer role, and some of them get the ability to impersonate using the Service Account role (how does that help them collab on model development?!)
upvoted 1 times
...
fitri001
1 year ago
Selected Answer: C
1. Create a new service account and grant it the Vertex AI User role: This dedicated service account will control access to the Vertex AI Workbench environment. 2. Grant the Service Account User role to each team member on the service account: This grants your team members the ability to use the service account to access the Workbench environment. 3. Grant the Notebook Viewer role to each team member: While they can't modify notebooks, this role allows team members to view and run existing notebooks within the Workbench environment. 4. Provision a Vertex AI Workbench user-managed notebook instance that uses the new service account: By associating the instance with the service account, you ensure only authorized team members (through the service account) can access the environment.
upvoted 1 times
fitri001
1 year ago
A. Notebook Viewer with Service Account User: Granting the Notebook User role on the service account would allow team members to modify notebooks, potentially exceeding your intended access limitations. B. Default Service Account: Granting access on the default Compute Engine service account is not recommended for security reasons. It's a shared resource and could grant unintended access. D. Primary User Access: Granting access through a single user account creates a security risk and is not scalable for managing team member permissions.
upvoted 1 times
...
...
guilhermebutzke
1 year, 2 months ago
Selected Answer: C
My Answer: C This approach ensures that each team member has access to the necessary resources while limiting access to other employees not involved in the project. In A, the Notebook Viewer role is just to see, which is not sufficient for accessing Vertex AI resources. In B, This option grants permissions to the default Compute Engine service account, which may not be ideal for managing access to Vertex AI resources specifically. In D, This approach does not provide uniform access control for all team members and may lead to inconsistencies in resource management.
upvoted 2 times
...
mindriddler
1 year, 3 months ago
Selected Answer: C
Why not A? Mainly because of the fact that we're only giving the role "Notebook Viewer" to the SA, which is not sufficient.
upvoted 2 times
guilhermebutzke
1 year, 2 months ago
in A, the Notebook Viewer role is just to see, which is not sufficient for accessing Vertex AI resources.
upvoted 2 times
...
...
b1a8fae
1 year, 3 months ago
Selected Answer: A
A and C really sound like the same. Only going for A because I understand it gives the lowest level of permission role when creating the project (that is, all members in the Compute Engine Project); and subsequently, grants User role ONLY to the team members. https://cloud.google.com/iam/docs/overview#resource
upvoted 1 times
tavva_prudhvi
1 year, 3 months ago
Creating a new service account with the Notebook Viewer role would not provide sufficient permissions for managing the Vertex AI Workbench environment, right?
upvoted 1 times
...
...
pikachu007
1 year, 4 months ago
Selected Answer: C
Dedicated Service Account: Creating a separate service account ensures isolation and control over access to Vertex AI resources. Vertex AI User Role: Granting this role to the service account provides it with necessary permissions to interact with Vertex AI services. Service Account User Role: Assigning this role to team members allows them to impersonate the service account, enabling them to use its permissions. Notebook Viewer Role: This role grants team members access to the notebook instance, but not direct Vertex AI resource management. User-Managed Notebook Instance: This type of instance uses a specific service account, ensuring access control is aligned with the designated service account's permissions.
upvoted 3 times
...

Topic 1 Question 221

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 221 discussion

You work at a leading healthcare firm developing state-of-the-art algorithms for various use cases. You have unstructured textual data with custom labels. You need to extract and classify various medical phrases with these labels. What should you do?

  • A. Use the Healthcare Natural Language API to extract medical entities
  • B. Use a BERT-based model to fine-tune a medical entity extraction model
  • C. Use AutoML Entity Extraction to train a medical entity extraction model
  • D. Use TensorFlow to build a custom medical entity extraction model
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
b1a8fae
Highly Voted 1 year, 10 months ago
Selected Answer: C
C. "AutoML Entity Extraction for Healthcare allows you to create a custom entity extraction model trained using your own annotated medical text and using your own categories." https://cloud.google.com/healthcare-api/docs/concepts/nlp#choosing_between_the_and
upvoted 11 times
sonicclasps
1 year, 9 months ago
textbook use case as described in the link provided
upvoted 1 times
...
daidai75
1 year, 9 months ago
Full Agreed
upvoted 1 times
...
...
lunalongo
Most Recent 11 months, 2 weeks ago
Selected Answer: C
C is the best option because AutoML Entity Extraction provides the best balance of ease of use, speed, and effectiveness for building a custom medical entity extraction model with your specific labeled data. *A is not proper for custom labels, B requires deep expertise and is time consuming. D is too complex, requires deep expertise and extensive code.
upvoted 1 times
...
VinaoSilva
1 year, 4 months ago
Selected Answer: C
"unstructured textual data with custom labels " = AutoML Entity Extraction
upvoted 1 times
...
fitri001
1 year, 6 months ago
Selected Answer: C
Pre-built Functionality: It's a pre-built and managed service within Vertex AI that streamlines the process of building custom entity extraction models. This can save you time and resources compared to building a model from scratch using TensorFlow (option D). Customizable Labels: AutoML Entity Extraction allows you to define your custom labels for medical phrases, which aligns well with your specific needs. Unstructured Text Support: It's designed to handle unstructured text data like your medical records. Faster Experimentation: Compared to a custom BERT-based model (option B), AutoML Entity Extraction often allows for faster experimentation as it automates many hyperparameter tuning aspects.
upvoted 3 times
fitri001
1 year, 6 months ago
A. Healthcare Natural Language API: While this API can extract medical entities like diseases or medications, it might not support the level of customization you need for your specific medical phrases with custom labels. B. BERT-based Model with Fine-tuning: Fine-tuning a BERT model can be effective, but it requires significant expertise in machine learning and natural language processing. AutoML Entity Extraction provides a more accessible and potentially faster approach for your use case. D. TensorFlow for Custom Model: Building a custom model with TensorFlow offers maximum control, but it requires a high level of expertise and can be time-consuming, especially for a team that might not specialize in NLP.
upvoted 2 times
...
...
guilhermebutzke
1 year, 9 months ago
Selected Answer: B
My answer: B Looking for “developing state-of-the-art algorithms for various use cases” in the question, I think the best approach is BERT-based model. AutoML Entity Extraction could be a approach for a quickstart, and Healthcare Natural Language API might not have your custom labels built-in, limiting its effectiveness. Tensorflow model can be time-consuming and require significant expertise https://cloud.google.com/healthcare-api/docs/concepts/nlp#choosing_between_the_and
upvoted 2 times
...
Dagogi96
1 year, 9 months ago
Selected Answer: A
A.- "The Healthcare Natural Language API parses unstructured medical text such as medical records or insurance claims. It then generates a structured data representation of the medical knowledge entities stored in these data sources for downstream analysis and automatio"
upvoted 3 times
...
pikachu007
1 year, 10 months ago
Selected Answer: B
A. Healthcare Natural Language API: While convenient, it lacks the customization capabilities for fine-tuning with custom labels, potentially limiting accuracy for your specific needs. C. AutoML Entity Extraction: It's generally well-suited for common entity types, but its pre-defined label set might not accommodate the full range of medical entities and relationships you need to extract. D. TensorFlow Custom Model: Building a model from scratch requires significant expertise, time, and resources, often less efficient than leveraging the power of pre-trained BERT models.
upvoted 2 times
...

Topic 1 Question 222

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 222 discussion

You developed a custom model by using Vertex AI to predict your application's user churn rate. You are using Vertex AI Model Monitoring for skew detection. The training data stored in BigQuery contains two sets of features - demographic and behavioral. You later discover that two separate models trained on each set perform better than the original model. You need to configure a new model monitoring pipeline that splits traffic among the two models. You want to use the same prediction-sampling-rate and monitoring-frequency for each model. You also want to minimize management effort. What should you do?

  • A. Keep the training dataset as is. Deploy the models to two separate endpoints, and submit two Vertex AI Model Monitoring jobs with appropriately selected feature-thresholds parameters.
  • B. Keep the training dataset as is. Deploy both models to the same endpoint and submit a Vertex AI Model Monitoring job with a monitoring-config-from-file parameter that accounts for the model IDs and feature selections.
  • C. Separate the training dataset into two tables based on demographic and behavioral features. Deploy the models to two separate endpoints, and submit two Vertex AI Model Monitoring jobs.
  • D. Separate the training dataset into two tables based on demographic and behavioral features. Deploy both models to the same endpoint, and submit a Vertex AI Model Monitoring job with a monitoring-config-from-file parameter that accounts for the model IDs and training datasets.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
guilhermebutzke
Highly Voted 1 year, 9 months ago
Selected Answer: B
My answer: B If you're using Vertex AI Model Monitoring for skew detection and your data is stored in BigQuery, it's not strictly necessary to separate the data into two tables. Vertex AI Model Monitoring can indeed analyze each feature individually to detect skew. So, isn't necessary to separate data. Then, the `monitoring-config-from-file` parameter lets you specify unique configurations for each model, including ID and training data information. This ensures targeted monitoring and analysis and a unique monitoring job.
upvoted 9 times
...
Dirtie_Sinkie
Most Recent 1 year, 1 month ago
Selected Answer: D
My vote is D, have to separate the training dataset
upvoted 1 times
...
bfdf9c8
1 year, 3 months ago
Selected Answer: D
The question mentions skew, yo need to configure the model monitoring with this in mind, so the better option is to separate in two diferent tables to user skew detection
upvoted 1 times
...
fitri001
1 year, 6 months ago
Selected Answer: B
Reduced Management Effort: You only need to deploy and monitor a single endpoint, minimizing complexity compared to managing two separate endpoints and monitoring jobs (Option A and C). Efficient Data Usage: Maintaining the original training dataset simplifies data management and avoids the need to split it into separate tables (Option C and D). Granular Monitoring: The monitoring-config-from-file parameter allows you to specify configurations for each model within the same monitoring job. You can define the model ID and the features to monitor for potential skew or drift for each model independently.
upvoted 3 times
fitri001
1 year, 6 months ago
A. Separate Endpoints and Monitoring Jobs: This approach requires managing two endpoints and monitoring jobs, increasing complexity. C. Separate Training Data and Separate Endpoints: While it separates training data, it requires managing separate endpoints and monitoring jobs, similar to option A. Additionally, splitting the data might be unnecessary for monitoring purposes in this scenario. D. Separate Training Data (Optional) and Single Endpoint: Splitting the data (optional) adds complexity, and while you can use a single endpoint, defining configurations for each model within the monitoring job is more efficient using the monitoring-config-from-file parameter (option B).
upvoted 2 times
...
...
pinimichele01
1 year, 7 months ago
Selected Answer: B
I don't understand why it is necessary to separate dataset when there is Vertex AI Monitoring
upvoted 1 times
SausageMuffins
1 year, 5 months ago
For training-skew detection, you require the training dataset. Hence, by splitting the original dataset into the two features, it would make management easier later on. Correct me if I'm wrong, but you would have to update the monitoring job when you retrain the model to keep the monitoring job updated as well. Hence splitting it makes sense. Agreed that same endpoint would be easier to manage as opposed to two. As a result, my answer is D.
upvoted 1 times
...
...
shuvs
1 year, 7 months ago
Selected Answer: D
Not B, as training on separate datasets is recommended.
upvoted 1 times
pinimichele01
1 year, 7 months ago
why? i don't understand sorry
upvoted 1 times
...
...
Yan_X
1 year, 8 months ago
Selected Answer: D
D Separate data to 2 tables to make sure both models are trained with most relevant data.
upvoted 1 times
...
b1a8fae
1 year, 10 months ago
Selected Answer: D
D. You need to split the training dataset for each respective model. Furthermore, you only need to control for 2 differences between models in monitoring-config-from-file: model ID, and training set. Feature selection should be the same in both models.
upvoted 1 times
vaibavi
1 year, 9 months ago
Why not B?
upvoted 2 times
...
...
shadz10
1 year, 10 months ago
Selected Answer: D
D - makes more sense two models to be trained seperately and more accuarately also submits a Vertex Al Model Monitoring job with a monitoring-config-from parameter which would enable the skew detecttion to work for each model
upvoted 2 times
...
pikachu007
1 year, 10 months ago
Selected Answer: B
A. Separate Endpoints: This approach involves more management overhead and potentially complicates monitoring configurations. C. Separate Datasets: Splitting the dataset into two tables is unnecessary for model monitoring and could introduce data management complexities. D. Separate Datasets, Same Endpoint: While feasible, this option lacks the flexibility of granular feature control provided by monitoring-config-from-file.
upvoted 4 times
...

Topic 1 Question 223

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 223 discussion

You work for a pharmaceutical company based in Canada. Your team developed a BigQuery ML model to predict the number of flu infections for the next month in Canada. Weather data is published weekly, and flu infection statistics are published monthly. You need to configure a model retraining policy that minimizes cost. What should you do?

  • A. Download the weather and flu data each week. Configure Cloud Scheduler to execute a Vertex AI pipeline to retrain the model weekly.
  • B. Download the weather and flu data each month. Configure Cloud Scheduler to execute a Vertex AI pipeline to retrain the model monthly.
  • C. Download the weather and flu data each week. Configure Cloud Scheduler to execute a Vertex AI pipeline to retrain the model every month.
  • D. Download the weather data each week, and download the flu data each month. Deploy the model to a Vertex AI endpoint with feature drift monitoring, and retrain the model if a monitoring alert is detected.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
fitri001
Highly Voted 1 year, 6 months ago
Selected Answer: D
Weather Data Update: Downloading weather data weekly captures the latest trends potentially influencing flu infections. Flu Data Update: Downloading flu statistics monthly aligns with the data publication schedule and avoids unnecessary processing for data that might not have changed. Feature Drift Monitoring: Vertex AI endpoint monitoring helps identify significant changes in the weather data distribution (feature drift) over time. Retrain Based on Alerts: Retraining the model is triggered only when feature drift is detected, ensuring the model stays relevant without unnecessary retraining cycles.
upvoted 6 times
fitri001
1 year, 6 months ago
A. Weekly Retraining: Retraining the model every week incurs processing costs even if the flu data (target variable) hasn't changed, potentially leading to wasted resources. B. Monthly Retraining: While cheaper than option A, it might miss capturing the impact of recent weather changes on flu infections. C. Weekly Data Download, Monthly Retraining: This approach downloads weather data more frequently than necessary and still incurs retraining costs even if feature drift hasn't occurred.
upvoted 1 times
Omi_04040
11 months ago
Why does a model that does batch prediction need to be deployed to an endpoint, the right ans seems to be B
upvoted 1 times
...
...
...
pinimichele01
Most Recent 1 year, 7 months ago
Selected Answer: D
minimize cost
upvoted 1 times
...
guilhermebutzke
1 year, 8 months ago
Selected Answer: D
My Answer: D Even though the model predicts values for the next month, it is necessary to consume weekly data because the model's output could change based on new weekly data. Therefore, it is necessary to download data weekly and monthly. Furthermore, it is not necessary to retrain the model if the feature distribution remains unchanged.
upvoted 1 times
...
b1a8fae
1 year, 10 months ago
Selected Answer: D
D. This way, cost is minimized by only retraining when feature drift takes place.
upvoted 4 times
...
pikachu007
1 year, 10 months ago
Selected Answer: D
Selective Retraining: Retraining occurs only when necessary, triggered by feature drift alerts, reducing cloud resource usage and associated costs. Efficient Data Utilization: Weather data is downloaded weekly to capture potential changes, but model retraining waits for monthly flu data, ensuring model relevance without excessive updates. Early Drift Detection: Vertex AI's feature drift monitoring proactively identifies model performance degradation, prompting timely retraining to maintain accuracy.
upvoted 2 times
...

Topic 1 Question 224

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 224 discussion

You are building a MLOps platform to automate your company’s ML experiments and model retraining. You need to organize the artifacts for dozens of pipelines. How should you store the pipelines’ artifacts?

  • A. Store parameters in Cloud SQL, and store the models’ source code and binaries in GitHub.
  • B. Store parameters in Cloud SQL, store the models’ source code in GitHub, and store the models’ binaries in Cloud Storage.
  • C. Store parameters in Vertex ML Metadata, store the models’ source code in GitHub, and store the models’ binaries in Cloud Storage.
  • D. Store parameters in Vertex ML Metadata and store the models’ source code and binaries in GitHub.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
OpenKnowledge
2 months, 1 week ago
Selected Answer: C
For storing machine learning model source code and binaries, Google recommends using Git for code versioning and Cloud Storage with an integrated Model Registry for binaries and artifacts.
upvoted 1 times
...
fitri001
1 year ago
Selected Answer: C
Vertex ML Metadata: This service is specifically designed to store and track metadata for ML pipelines, including parameters. It provides a centralized location to manage and query pipeline execution details, making it ideal for dozens of pipelines. Cloud Storage: This is a scalable and cost-effective storage solution for model binaries. It integrates well with Vertex AI and other cloud services. GitHub: While not a Google Cloud service, it's a popular version control system well-suited for storing and managing your models' source code, particularly for collaboration among team members.
upvoted 3 times
fitri001
1 year ago
A. Cloud SQL for Parameters: While Cloud SQL is a relational database service, Vertex ML Metadata offers a dedicated solution for ML metadata management, including parameters, providing better integration and functionality within the MLOps context. D. Vertex ML Metadata for Source Code and Binaries: Vertex ML Metadata is primarily focused on ML pipeline metadata and experiment tracking. Cloud Storage is a more appropriate service for storing large binary files like model artifacts.
upvoted 1 times
...
...
pinimichele01
1 year, 1 month ago
Selected Answer: C
shadz10
upvoted 1 times
...
shadz10
1 year, 3 months ago
Selected Answer: C
https://cloud.google.com/architecture/architecture-for-mlops-using-tfx-kubeflow-pipelines-and-cloud-build
upvoted 2 times
...
pikachu007
1 year, 4 months ago
Selected Answer: C
A. Cloud SQL and GitHub: Cloud SQL isn't designed for ML metadata management, potentially leading to challenges in tracking experiment details and lineage. B. Cloud SQL, GitHub, and Cloud Storage: While viable, this approach misses the benefits of Vertex ML Metadata for organized ML artifact management. D. Vertex ML Metadata and GitHub: Storing model binaries in GitHub can be inefficient for large files and might incur higher storage costs.
upvoted 2 times
...

Topic 1 Question 225

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 225 discussion

You work for a telecommunications company. You’re building a model to predict which customers may fail to pay their next phone bill. The purpose of this model is to proactively offer at-risk customers assistance such as service discounts and bill deadline extensions. The data is stored in BigQuery and the predictive features that are available for model training include:

- Customer_id
- Age
- Salary (measured in local currency)
- Sex
- Average bill value (measured in local currency)
- Number of phone calls in the last month (integer)
- Average duration of phone calls (measured in minutes)

You need to investigate and mitigate potential bias against disadvantaged groups, while preserving model accuracy.

What should you do?

  • A. Determine whether there is a meaningful correlation between the sensitive features and the other features. Train a BigQuery ML boosted trees classification model and exclude the sensitive features and any meaningfully correlated features.
  • B. Train a BigQuery ML boosted trees classification model with all features. Use the ML.GLOBAL_EXPLAIN method to calculate the global attribution values for each feature of the model. If the feature importance value for any of the sensitive features exceeds a threshold, discard the model and tram without this feature.
  • C. Train a BigQuery ML boosted trees classification model with all features. Use the ML.EXPLAIN_PREDICT method to calculate the attribution values for each feature for each customer in a test set. If for any individual customer, the importance value for any feature exceeds a predefined threshold, discard the model and train the model again without this feature.
  • D. Define a fairness metric that is represented by accuracy across the sensitive features. Train a BigQuery ML boosted trees classification model with all features. Use the trained model to make predictions on a test set. Join the data back with the sensitive features, and calculate a fairness metric to investigate whether it meets your requirements.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
fitri001
1 year ago
Selected Answer: D
Fairness Metric: Defining a metric like parity (equal accuracy) or calibration (similar predicted probabilities) across sensitive features like age, sex, or salary allows you to quantify potential bias. Model Training with All Features (Initially): Training the model with all features provides a baseline performance and allows you to identify potentially biased features later. Test Set Predictions: Making predictions on a held-out test set ensures the evaluation is based on unseen data and avoids overfitting. Joining Back Sensitive Features: Reintroducing sensitive features after prediction allows you to calculate fairness metrics for different customer groups. Iterative Refinement: Based on the fairness metric results, you can determine if further mitigation strategies are needed.
upvoted 3 times
fitri001
1 year ago
A. Excluding Features Based on Correlation: While correlated features might indicate bias, simply excluding them can discard valuable information and potentially reduce model accuracy. B. Global Attribution for Feature Removal: Using global feature importance might not reveal bias impacting specific customer groups. Additionally, discarding a feature solely based on importance could affect model performance. C. Individual Attribution for Model Discarding: While individual attribution can identify per-customer bias, discarding the model entirely based on a single instance might be overly cautious and lead to starting from scratch frequently.
upvoted 1 times
...
...
pinimichele01
1 year, 1 month ago
Selected Answer: D
https://cloud.google.com/vertex-ai/docs/evaluation/intro-evaluation-fairness
upvoted 1 times
...
prtikare
1 year, 2 months ago
Answer is A
upvoted 1 times
...
shadz10
1 year, 3 months ago
Selected Answer: D
https://cloud.google.com/vertex-ai/docs/evaluation/intro-evaluation-fairness
upvoted 1 times
...
pikachu007
1 year, 4 months ago
Selected Answer: D
Direct Bias Assessment: It directly measures model fairness using a relevant metric, providing clear insights into potential issues. Preserving Information: It avoids prematurely removing features, potentially capturing valuable predictive signals while mitigating bias. Aligning with Goals: It allows tailoring the fairness metric to specific ethical and business objectives.
upvoted 2 times
...

Topic 1 Question 226

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 226 discussion

You recently trained a XGBoost model that you plan to deploy to production for online inference. Before sending a predict request to your model’s binary, you need to perform a simple data preprocessing step. This step exposes a REST API that accepts requests in your internal VPC Service Controls and returns predictions. You want to configure this preprocessing step while minimizing cost and effort. What should you do?

  • A. Store a pickled model in Cloud Storage. Build a Flask-based app, package the app in a custom container image, and deploy the model to Vertex AI Endpoints.
  • B. Build a Flask-based app, package the app and a pickled model in a custom container image, and deploy the model to Vertex AI Endpoints.
  • C. Build a custom predictor class based on XGBoost Predictor from the Vertex AI SDK, package it and a pickled model in a custom container image based on a Vertex built-in image, and deploy the model to Vertex AI Endpoints.
  • D. Build a custom predictor class based on XGBoost Predictor from the Vertex AI SDK, and package the handler in a custom container image based on a Vertex built-in container image. Store a pickled model in Cloud Storage, and deploy the model to Vertex AI Endpoints.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
fitri001
Highly Voted 1 year, 6 months ago
Selected Answer: D
why not c? While it utilizes the XGBoost Predictor, packaging the pickled model in the container increases image size and requires redeploying the container for model updates.
upvoted 5 times
fitri001
1 year, 6 months ago
why D? Reduced Code Footprint: You only need to write the custom predictor logic, not a full Flask application. This minimizes development effort and container size. Leverages Vertex AI Features: By using the XGBoost Predictor from the Vertex AI SDK, you benefit from pre-built functionality for handling XGBoost models. Cost-Effective Deployment: Utilizing Vertex built-in container images reduces the need for custom image maintenance and potentially lowers container runtime costs. Separate Model Storage: Storing the pickled model in Cloud Storage keeps the model separate from the prediction logic, allowing for easier model updates without redeploying the entire container.
upvoted 3 times
...
...
lunalongo
Most Recent 11 months, 2 weeks ago
Selected Answer: B
Option B is simpler (Flask app handles preproc directly) and less costly (Storage within the container) *** Storing the pickled model in Cloud Storage adds network calls during prediction, increasing latency and cost. ***The XGBoost Predictor (C & D) task adds unneeded complexity to a simple preprocessing task
upvoted 2 times
...
guilhermebutzke
1 year, 8 months ago
Selected Answer: D
My Answer: D This option involves using the Vertex AI SDK to build a custom predictor class, which allows for easy integration with the XGBoost model. Packaging the handler in a custom container image based on a Vertex built-in container image ensures compatibility and smooth deployment. Storing the pickled model in Cloud Storage provides a scalable and reliable way to access the model. Deploying the model to Vertex AI Endpoints allows for easy management and scaling of inference requests, while minimizing cost and effort. The main difference between C and D is where the model is saved. So, is a good practice to save models in GCS because Separation of Concerns, Flexibility, and Reduced Image Size
upvoted 2 times
...
pikachu007
1 year, 10 months ago
Selected Answer: D
Minimal Custom Code: Leverages the pre-built XGBoost Predictor class for core model prediction, reducing development effort and potential errors. Optimized Container Image: Utilizes a Vertex built-in container image, pre-configured for efficient model serving and compatibility with Vertex AI Endpoints. Separated Model Storage: Stores the model in Cloud Storage, reducing container image size and simplifying model updates independently of the container. VPC Service Controls: Vertex AI Endpoints support VPC Service Controls, ensuring adherence to internal traffic restrictions.
upvoted 3 times
...

Topic 1 Question 227

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 227 discussion

You work at a bank. You need to develop a credit risk model to support loan application decisions. You decide to implement the model by using a neural network in TensorFlow. Due to regulatory requirements, you need to be able to explain the model’s predictions based on its features. When the model is deployed, you also want to monitor the model’s performance over time. You decided to use Vertex AI for both model development and deployment. What should you do?

  • A. Use Vertex Explainable AI with the sampled Shapley method, and enable Vertex AI Model Monitoring to check for feature distribution drift.
  • B. Use Vertex Explainable AI with the sampled Shapley method, and enable Vertex AI Model Monitoring to check for feature distribution skew.
  • C. Use Vertex Explainable AI with the XRAI method, and enable Vertex AI Model Monitoring to check for feature distribution drift.
  • D. Use Vertex Explainable AI with the XRAI method, and enable Vertex AI Model Monitoring to check for feature distribution skew.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
b1a8fae
Highly Voted 1 year, 3 months ago
Selected Answer: A
Not image -> not XRAI Performance over time -> drift, not skew
upvoted 10 times
...
fitri001
Most Recent 1 year ago
Selected Answer: A
why not the others? B. Feature Distribution Skew: While skew can be relevant, drift is generally a more significant concern for credit risk models. Drift indicates a change in the underlying data distribution, potentially impacting model performance. C & D. XRAI Method: XRAI (Explainable AI for Images) is specifically designed for interpreting image classification models. It wouldn't be the most effective choice for a neural network-based credit risk model working with tabular data.
upvoted 3 times
fitri001
1 year ago
Vertex Explainable AI: This is a built-in Vertex AI feature that helps understand how features contribute to model predictions. Sampled Shapley Method: This is a well-suited method for explaining complex models like neural networks. It provides insights into feature importance without requiring retraining the entire model.
upvoted 1 times
...
...
winston9
1 year, 4 months ago
Selected Answer: A
Explainable AI with the XRAI method is for unstructured, image region analysis, in this case we use structured data for loan approval analysis.
upvoted 2 times
...

Topic 1 Question 228

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 228 discussion

You are investigating the root cause of a misclassification error made by one of your models. You used Vertex AI Pipelines to train and deploy the model. The pipeline reads data from BigQuery. creates a copy of the data in Cloud Storage in TFRecord format, trains the model in Vertex AI Training on that copy, and deploys the model to a Vertex AI endpoint. You have identified the specific version of that model that misclassified, and you need to recover the data this model was trained on. How should you find that copy of the data?

  • A. Use Vertex AI Feature Store. Modify the pipeline to use the feature store, and ensure that all training data is stored in it. Search the feature store for the data used for the training.
  • B. Use the lineage feature of Vertex AI Metadata to find the model artifact. Determine the version of the model and identify the step that creates the data copy and search in the metadata for its location.
  • C. Use the logging features in the Vertex AI endpoint to determine the timestamp of the model’s deployment. Find the pipeline run at that timestamp. Identify the step that creates the data copy, and search in the logs for its location.
  • D. Find the job ID in Vertex AI Training corresponding to the training for the model. Search in the logs of that job for the data used for the training.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
fitri001
Highly Voted 1 year ago
Selected Answer: B
Vertex AI Metadata Lineage: This feature tracks the relationships between pipeline components and the artifacts they produce. By identifying the model version's lineage, you can pinpoint the specific pipeline run that generated it. Data Copy Step: Within the pipeline run, locate the step responsible for creating the data copy in TFRecord format for training. Metadata Search: Vertex AI Metadata likely stores information about the data copy's location in Cloud Storage, allowing you to access it.
upvoted 5 times
fitri001
1 year ago
A. Feature Store: Feature Store is designed for managing feature engineering and serving preprocessed features, not necessarily raw training data. While it could be a good practice for future pipelines, it wouldn't help recover historical data. C. Endpoint Logs: Endpoint logs primarily focus on model deployment details and might not provide information about the specific training data used for a particular version. D. Training Job Logs: Training job logs might contain references to the data used, but they might not be as detailed or structured as Vertex AI Metadata lineage, making it harder to pinpoint the exact data copy location.
upvoted 1 times
...
...
pinimichele01
Most Recent 1 year, 1 month ago
Selected Answer: B
agree with shadz10 and pikachu007
upvoted 1 times
...
shadz10
1 year, 3 months ago
Selected Answer: B
https://cloud.google.com/vertex-ai/docs/ml-metadata/introduction
upvoted 2 times
...
pikachu007
1 year, 4 months ago
Selected Answer: B
A. Feature Store: While useful for managing features, it might not store complete training datasets, and modifying the pipeline would not help recover historical data. C. Endpoint Logs and Pipeline Run: This approach involves more manual searching and might be less precise for identifying the exact data copy. D. Training Job Logs: Training job logs might not reliably contain complete data paths or might be purged after a certain period.
upvoted 3 times
...

Topic 1 Question 229

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 229 discussion

You work for a manufacturing company. You need to train a custom image classification model to detect product defects at the end of an assembly line. Although your model is performing well, some images in your holdout set are consistently mislabeled with high confidence. You want to use Vertex AI to understand your model’s results. What should you do?

  • A. Configure feature-based explanations by using Integrated Gradients. Set visualization type to PIXELS, and set clip_percent_upperbound to 95.
  • B. Create an index by using Vertex AI Matching Engine. Query the index with your mislabeled images.
  • C. Configure feature-based explanations by using XRAI. Set visualization type to OUTLINES, and set polarity to positive.
  • D. Configure example-based explanations. Specify the embedding output layer to be used for the latent space representation.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
guilhermebutzke
Highly Voted 1 year, 8 months ago
My Answer: A According to this documentation: https://cloud.google.com/vertex-ai/docs/explainable-ai/visualization-settings This option A aligns with using Integrated Gradients, which is suitable for feature-based explanations. Setting the visualization type to PIXELS allows for per-pixel attribution, which can help in understanding the specific regions of the image influencing the model's decision. Additionally, setting the clip_percent_upperbound parameter to 95 helps in filtering out noise and focusing on areas of strong attribution, which is crucial for understanding mislabeled images with high confidence. Option C suggests using XRAI for feature-based explanations and setting the visualization type to OUTLINES, along with setting the polarity to positive. However, based on the provided documentation, XRAI is recommended to have its visualization type set to PIXELS, not OUTLINES.
upvoted 6 times
...
dija123
Most Recent 1 month ago
Selected Answer: D
Example-based explanations are designed for exactly this purpose. Instead of showing which pixels were important (like feature-based explanations), they answer the question: "Which examples from the training data did the model consider most similar to this new image when making its decision?"
upvoted 1 times
...
rajshiv
11 months, 1 week ago
Selected Answer: A
A is the best answer among the choices. Integrated gradient and pixel visualization is the best alternative among the choices given.
upvoted 3 times
...
YangG
1 year ago
Selected Answer: D
It is to understand why model is making specific mistakes, so example-based explanation makes sense to me.
upvoted 2 times
...
eico
1 year, 2 months ago
Selected Answer: D
https://cloud.google.com/vertex-ai/docs/explainable-ai/overview#example-based "Improve your data or model: One of the core use cases for example-based explanations is helping you understand why your model made certain mistakes in its predictions, and using those insights to improve your data or model. [...] For example, suppose we have a model that classifies images as either a bird or a plane, and that it is misclassifying the following bird as a plane with high confidence. You can use Example-based explanations to retrieve similar images from the training set to figure out what is happening." Not A: Integrated Gradients is recommended for low-contrast images, such as X-rays https://cloud.google.com/vertex-ai/docs/explainable-ai/overview#compare-methods Not C: Cannot set Outlines for XRAI https://cloud.google.com/ai-platform/prediction/docs/ai-explanations/visualizing-explanations
upvoted 4 times
...
VipinSingla
1 year, 8 months ago
Selected Answer: D
Improve your data or model: One of the core use cases for example-based explanations is helping you understand why your model made certain mistakes in its predictions, and using those insights to improve your data or model. https://cloud.google.com/vertex-ai/docs/explainable-ai/overview
upvoted 4 times
...
sonicclasps
1 year, 9 months ago
Selected Answer: A
Although Xrai could be an option, it doesn't not allow you to set those options, so only other answer is A https://cloud.google.com/vertex-ai/docs/explainable-ai/visualization-settings#visualization_options
upvoted 3 times
vaibavi
1 year, 9 months ago
Why not it's D? https://cloud.google.com/vertex-ai/docs/explainable-ai/overview
upvoted 1 times
vaibavi
1 year, 9 months ago
For example, suppose we have a model that classifies images as either a bird or a plane, and that it is misclassifying the following bird as a plane with high confidence. You can use Example-based explanations to retrieve similar images from the training set to figure out what is happening.
upvoted 2 times
sonicclasps
1 year, 9 months ago
yes you are correct, but having to specify the output layer to be used is definitely no guarantee that you'll get examples that are easily interpretable (imo)
upvoted 1 times
...
...
...
...
shadz10
1 year, 10 months ago
Selected Answer: A
Going with A Not c - For XRAI, Pixels is the default setting and shows areas of attribution. Outlines is not recommended for XRAI. https://cloud.google.com/ai-platform/prediction/docs/ai-explanations/visualizing-explanations
upvoted 3 times
...

Topic 1 Question 230

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 230 discussion

You are training models in Vertex AI by using data that spans across multiple Google Cloud projects. You need to find, track, and compare the performance of the different versions of your models. Which Google Cloud services should you include in your ML workflow?

  • A. Dataplex, Vertex AI Feature Store, and Vertex AI TensorBoard
  • B. Vertex AI Pipelines, Vertex AI Feature Store, and Vertex AI Experiments
  • C. Dataplex, Vertex AI Experiments, and Vertex AI ML Metadata
  • D. Vertex AI Pipelines, Vertex AI Experiments, and Vertex AI Metadata
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
fitri001
Highly Voted 1 year, 6 months ago
Selected Answer: D
Why not the others? A. Dataplex & Vertex AI Feature Store: While Dataplex can manage data across projects, it's not directly tied to model versioning and comparison. Feature Store focuses on feature engineering, not model version management. B. Vertex AI Feature Store & Vertex AI TensorBoard: Similar to option A, Feature Store isn't directly involved in model version tracking, and TensorBoard is primarily for visualizing training data and metrics, not model version comparison across projects. C. Dataplex & Vertex AI ML Metadata: Dataplex, as mentioned earlier, doesn't directly address model version comparison. While ML Metadata tracks lineage, it might not have the experiment management features of Vertex AI Experiments.
upvoted 6 times
fitri001
1 year, 6 months ago
Vertex AI Pipelines (Optional): While optional, pipelines can automate your training workflow, including data access from BigQuery tables in different projects. It helps orchestrate the training process across projects. Vertex AI Experiments: This service is crucial for tracking and comparing the performance of different model versions. It allows you to: Run multiple training experiments with different configurations. Track experiment metrics like accuracy, precision, recall, etc. Compare the performance of different model versions trained in various projects. Vertex AI Metadata: This service provides a centralized view of your ML workflow, including model lineage and versioning. It's particularly helpful in your scenario because: It tracks the origin and relationships between models, including the specific data used for training, regardless of the project. You can see how different model versions (potentially trained across projects) relate to each other and the data they were trained on.
upvoted 3 times
...
...
el_vampiro
Most Recent 2 months ago
Selected Answer: C
The "find" and "data across projects" keywords indicate you need Dataplex.
upvoted 1 times
...
Fer660
2 months, 1 week ago
Selected Answer: C
It's C. The discussion below tells me only one thing: Google's product lineup remains unclear, even for folks who have been studying it for weeks. I don't think we are all dumb....
upvoted 1 times
...
Umanga
11 months, 2 weeks ago
Selected Answer: C
Ans : C 1. Dataplex : https://cloud.google.com/vertex-ai/docs/model-registry/introduction#search_and_discover_models_usings_service 2. Vertex AI Experiments: This service is crucial for tracking and comparing the performance of different model versions. It allows you to: Run multiple training experiments with different configurations. Track experiment metrics like accuracy, precision, recall, etc. Compare the performance of different model versions trained in various projects. 3. Vertex AI Metadata: This service provides a centralized view of your ML workflow, including model lineage and versioning. It's particularly helpful in your scenario because:
upvoted 3 times
...
Aastha_Vashist
1 year, 7 months ago
Selected Answer: D
went with D
upvoted 2 times
...
Yan_X
1 year, 8 months ago
Selected Answer: D
I would go with option D. No Vertex AI pipeline no orchestration. So rule out A and C. Vertex AI Metadata is for 'spans across multiple Google Cloud projects' data used by the model.
upvoted 3 times
...
Carlose2108
1 year, 8 months ago
Why not Option D?
upvoted 1 times
...
guilhermebutzke
1 year, 8 months ago
Selected Answer: B
My Answer: B Vertex AI Pipelines: to create, deploy, and manage ML pipelines, which are essential for orchestrating your ML workflow, especially when dealing with data spanning multiple projects. Vertex AI Feature Store: It's crucial for managing feature data across different projects. Vertex AI Experiments: track and compare the performance of different versions of your models, enabling you to experiment Why not the other: Dataplex: not specifically tailored for managing ML workflows or model training. Vertex AI ML metadata: not sufficient on its own to cover all aspects of managing the ML workflow across multiple projects. Vertex AI TensorBoard: not specifically designed for managing the end-to-end ML workflow or tracking model versions across multiple projects.
upvoted 2 times
tavva_prudhvi
1 year, 6 months ago
I feel, Vertex AI Feature Store is valuable for managing and serving features for ML models, but it doesn't address the need for tracking experiments and managing metadata, right?
upvoted 1 times
...
...
SKDE
1 year, 9 months ago
Selected Answer: B
Dataplex works well with the data across projects and even on-prem, but doesn't work well with the ML related data like tracking and performance. So options A and C are considered wrong. Metadata is to store metadata. So it is not required while we consider to compare the model performance. So option D is wrong. On the other hand Feature store brings meaningful data for comparing the Models performance based on feature data. So Option B is correct
upvoted 1 times
...
b1a8fae
1 year, 9 months ago
Selected Answer: C
I go with C. Dataplex to centralize different Google projects. Vertex AI experiments + ML Metadata to track experiment lineage, parameter usage etc and compare models.
upvoted 4 times
daidai75
1 year, 9 months ago
How about Option D? the Pipeline can also do the cross project data processing.
upvoted 5 times
...
...

Topic 1 Question 231

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 231 discussion

You are using Keras and TensorFlow to develop a fraud detection model. Records of customer transactions are stored in a large table in BigQuery. You need to preprocess these records in a cost-effective and efficient way before you use them to train the model. The trained model will be used to perform batch inference in BigQuery. How should you implement the preprocessing workflow?

  • A. Implement a preprocessing pipeline by using Apache Spark, and run the pipeline on Dataproc. Save the preprocessed data as CSV files in a Cloud Storage bucket.
  • B. Load the data into a pandas DataFrame. Implement the preprocessing steps using pandas transformations, and train the model directly on the DataFrame.
  • C. Perform preprocessing in BigQuery by using SQL. Use the BigQueryClient in TensorFlow to read the data directly from BigQuery.
  • D. Implement a preprocessing pipeline by using Apache Beam, and run the pipeline on Dataflow. Save the preprocessed data as CSV files in a Cloud Storage bucket.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
b1a8fae
Highly Voted 1 year, 3 months ago
Selected Answer: C
Easiest to preprocess the data on BigQuery.
upvoted 7 times
...
pinimichele01
Most Recent 1 year, 1 month ago
Selected Answer: C
went with C
upvoted 2 times
pinimichele01
1 year ago
Easiest to preprocess the data on BigQuery.
upvoted 2 times
...
...
pikachu007
1 year, 4 months ago
Selected Answer: C
A. Spark on Dataproc: While powerful, it incurs additional cluster setup and management costs, potentially less cost-effective for this specific use case. B. pandas DataFrame: Loading large datasets into memory might lead to resource constraints and performance issues, especially for large-scale preprocessing. D. Apache Beam on Dataflow: While scalable, it introduces extra complexity for managing a separate pipeline and storage for preprocessed data.
upvoted 4 times
...

Topic 1 Question 232

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 232 discussion

You need to use TensorFlow to train an image classification model. Your dataset is located in a Cloud Storage directory and contains millions of labeled images. Before training the model, you need to prepare the data. You want the data preprocessing and model training workflow to be as efficient, scalable, and low maintenance as possible. What should you do?

  • A. 1. Create a Dataflow job that creates sharded TFRecord files in a Cloud Storage directory.
    2. Reference tf.data.TFRecordDataset in the training script.
    3. Train the model by using Vertex AI Training with a V100 GPU.
  • B. 1. Create a Dataflow job that moves the images into multiple Cloud Storage directories, where each directory is named according to the corresponding label
    2. Reference tfds.folder_dataset:ImageFolder in the training script.
    3. Train the model by using Vertex AI Training with a V100 GPU.
  • C. 1. Create a Jupyter notebook that uses an nt-standard-64 V100 GPU Vertex AI Workbench instance.
    2. Write a Python script that creates sharded TFRecord files in a directory inside the instance.
    3. Reference tf.data.TFRecordDataset in the training script.
    4. Train the model by using the Workbench instance.
  • D. 1. Create a Jupyter notebook that uses an n1-standard-64, V100 GPU Vertex AI Workbench instance.
    2. Write a Python script that copies the images into multiple Cloud Storage directories, where each. directory is named according to the corresponding label.
    3. Reference tfds.foladr_dataset.ImageFolder in the training script.
    4. Train the model by using the Workbench instance.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
pinimichele01
Highly Voted 1 year, 7 months ago
Selected Answer: A
millions of labeled images -> dataflow tfrecord faster than folder-based
upvoted 8 times
...
AzureDP900
Most Recent 1 year, 4 months ago
A is correct Here's why You need to prepare the data before training an image classification model. Using TFRecord files allows you to store your data in a format that can be efficiently read and processed by TensorFlow. Sharding the data into multiple files allows for parallel processing and scalability. Dataflow is a Google Cloud service that provides a scalable and reliable way to process large datasets. By using Vertex AI Training with a V100 GPU, you can train your model in an efficient and cost-effective manner.
upvoted 1 times
...
b1a8fae
1 year, 9 months ago
Selected Answer: A
Ideally you want to export your data in TFRecords (most efficient image format) in Cloud Storage, and not in the instance (to improve scalability)
upvoted 4 times
...
pikachu007
1 year, 10 months ago
Selected Answer: A
B. Folder-Based Structure: While viable, it's less efficient for large datasets compared to TFRecord files, potentially leading to slower I/O during training. C. Workbench Processing: Local preprocessing on a single instance can be less scalable and efficient for millions of images, potentially introducing bottlenecks. D. Workbench Training: While Workbench offers a Jupyter environment, Vertex AI Training is specifically designed for scalable model training, providing optimized hardware and infrastructure.
upvoted 2 times
...

Topic 1 Question 233

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 233 discussion

You are building a custom image classification model and plan to use Vertex AI Pipelines to implement the end-to-end training. Your dataset consists of images that need to be preprocessed before they can be used to train the model. The preprocessing steps include resizing the images, converting them to grayscale, and extracting features. You have already implemented some Python functions for the preprocessing tasks. Which components should you use in your pipeline?

  • A. DataprocSparkBatchOp and CustomTrainingJobOp
  • B. DataflowPythonJobOp, WaitGcpResourcesOp, and CustomTrainingJobOp
  • C. dsl.ParallelFor, dsl.component, and CustomTrainingJobOp
  • D. ImageDatasetImportDataOp, dsl.component, and AutoMLImageTrainingJobRunOp
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
guilhermebutzke
Highly Voted 1 year, 8 months ago
Selected Answer: B
My Answer: B Looking for the options, DataflowPythonJobOp can be used for parallelizing the preprocessing tasks, which is suitable for image resizing, converting to grayscale, and extracting features. dsl.ParallelFor could be useful for parallelizing tasks but might not be the most straightforward option for image preprocessing. Generally DataflowPythonJobOp is followed by WaitGcpResourcesOp. https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/fe7d3e4b8edc137d90ec061789b879b7cc8d3854/notebooks/community/ml_ops/stage3/get_started_with_dataflow_flex_template_component.ipynb
upvoted 6 times
...
Dirtie_Sinkie
Most Recent 1 year, 1 month ago
Selected Answer: B
B is definitely right, no doubt
upvoted 2 times
...
pinimichele01
1 year, 7 months ago
Selected Answer: B
https://cloud.google.com/vertex-ai/docs/pipelines/dataflow-component#dataflowpythonjobop
upvoted 1 times
...
b1a8fae
1 year, 9 months ago
Selected Answer: B
I go with B. Custom training is surely required. Discarding A because Spark is not mentioned anywhere in the problem description. C involves Kubeflow which seems a bit overkill imo. DataflowPythonJobOp operator lets you create a Vertex AI Pipelines component that prepares data -> seems like the appropriate course of action to me. https://cloud.google.com/vertex-ai/docs/pipelines/dataflow-component#dataflowpythonjobop
upvoted 2 times
...
pikachu007
1 year, 10 months ago
Selected Answer: B
A. DataprocSparkBatchOp: While capable of data processing, it's less well-suited for image-specific tasks like resizing and grayscale conversion compared to DataflowPythonJobOp. C. dsl.ParallelFor, dsl.component: While offering flexibility, they require more manual orchestration and potentially less efficient for image preprocessing compared to DataflowPythonJobOp. D. ImageDatasetImportDataOp, AutoMLImageTrainingJobRunOp: These components are designed for AutoML Image training, not directly compatible with custom preprocessing and training tasks.
upvoted 2 times
...

Topic 1 Question 234

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 234 discussion

You work for a retail company that is using a regression model built with BigQuery ML to predict product sales. This model is being used to serve online predictions. Recently you developed a new version of the model that uses a different architecture (custom model). Initial analysis revealed that both models are performing as expected. You want to deploy the new version of the model to production and monitor the performance over the next two months. You need to minimize the impact to the existing and future model users. How should you deploy the model?

  • A. Import the new model to the same Vertex AI Model Registry as a different version of the existing model. Deploy the new model to the same Vertex AI endpoint as the existing model, and use traffic splitting to route 95% of production traffic to the BigQuery ML model and 5% of production traffic to the new model.
  • B. Import the new model to the same Vertex AI Model Registry as the existing model. Deploy the models to one Vertex AI endpoint. Route 95% of production traffic to the BigQuery ML model and 5% of production traffic to the new model.
  • C. Import the new model to the same Vertex AI Model Registry as the existing model. Deploy each model to a separate Vertex AI endpoint.
  • D. Deploy the new model to a separate Vertex AI endpoint. Create a Cloud Run service that routes the prediction requests to the corresponding endpoints based on the input feature values.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
fitri001
Highly Voted 1 year, 6 months ago
Selected Answer: A
Minimal Disruption: Deploying the new model to the same endpoint avoids changes for existing users. Traffic splitting ensures a gradual rollout, minimizing any potential impact on production. Performance Monitoring: By routing a small percentage of traffic (5%) to the new model, you can monitor its performance in a controlled environment for the next two months. Metrics like prediction accuracy and latency can be compared with the BigQuery ML model. Versioning in Model Registry: Storing both models in the same Vertex AI Model Registry with clear versioning allows easy tracking and management.
upvoted 5 times
fitri001
1 year, 6 months ago
why not others option? B. Deploying Models to One Endpoint without Traffic Splitting: This approach doesn't allow for controlled rollout and could abruptly switch all traffic to the new model, potentially causing disruptions. C. Deploying Models to Separate Endpoints: This requires users to update their prediction pipelines to interact with the new endpoint, introducing unnecessary complexity and potential delays. D. Cloud Run Service with Feature-Based Routing: While Cloud Run can route traffic, feature-based routing might be more complex to implement for sales prediction and might not be necessary with traffic splitting.
upvoted 2 times
...
...
bfdf9c8
Most Recent 1 year, 3 months ago
Selected Answer: B
I’m considering two options, A and B. Both deploy to the same endpoint and divide traffic in a similar way. However, option B is more appropriate because it generates a new model rather than just creating a new version of the existing model.
upvoted 2 times
...
pinimichele01
1 year, 7 months ago
Selected Answer: A
https://cloud.google.com/vertex-ai/docs/general/deployment#models-endpoint
upvoted 2 times
...
Yan_X
1 year, 9 months ago
Selected Answer: A
A, no need to separate endpoint.
upvoted 4 times
...
BlehMaks
1 year, 9 months ago
Selected Answer: C
as i understand we need to minimize the impact to the model users, so if we take a part of the traffic from the old model users, we will effect them. As for me we should deploy models to separated endpoints and duplicate the traffic
upvoted 2 times
...
pikachu007
1 year, 10 months ago
Selected Answer: A
B. Doesn't Specify Traffic Splitting: Deploying models to a single endpoint without explicit traffic splitting might lead to unpredictable model selection behavior, hindering controlled evaluation. C. Separate Endpoints: While isolating models, it introduces complexity in managing multiple endpoints and routing logic, increasing operational overhead. D. Cloud Run Routing: Adds complexity by requiring a separate service to manage routing, potentially increasing latency and maintenance overhead compared to Vertex AI's built-in traffic splitting.
upvoted 2 times
...

Topic 1 Question 235

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 235 discussion

You are using Vertex AI and TensorFlow to develop a custom image classification model. You need the model’s decisions and the rationale to be understandable to your company’s stakeholders. You also want to explore the results to identify any issues or potential biases. What should you do?

  • A. 1. Use TensorFlow to generate and visualize features and statistics.
    2. Analyze the results together with the standard model evaluation metrics.
  • B. 1. Use TensorFlow Profiler to visualize the model execution.
    2. Analyze the relationship between incorrect predictions and execution bottlenecks.
  • C. 1. Use Vertex Explainable AI to generate example-based explanations.
    2. Visualize the results of sample inputs from the entire dataset together with the standard model evaluation metrics.
  • D. 1. Use Vertex Explainable AI to generate feature attributions. Aggregate feature attributions over the entire dataset.
    2. Analyze the aggregation result together with the standard model evaluation metrics.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
guilhermebutzke
Highly Voted 1 year, 2 months ago
Selected Answer: D
My Answer: D This approach leverages Vertex Explainable AI to provide feature attributions, which helps in understanding the rationale behind the model's decisions. By aggregating these feature attributions over the entire dataset, you can gain insights into potential biases or areas of concern. Analyzing these results alongside standard model evaluation metrics allows for a comprehensive understanding of the model's performance and its interpretability. Option C is better to understand specific cases, but does not show overall contributions.
upvoted 8 times
...
gscharly
Most Recent 1 year ago
Selected Answer: D
agree with guilhermebutzke
upvoted 1 times
...
pinimichele01
1 year ago
Selected Answer: D
Debugging models: Feature attributions can help detect issues in the data that standard model evaluation techniques would usually miss.
upvoted 1 times
...
shadz10
1 year, 3 months ago
Selected Answer: D
If you inspect specific instances, and also aggregate feature attributions across your training dataset, you can get deeper insight into how your model works. Consider the following advantages: Debugging models: Feature attributions can help detect issues in the data that standard model evaluation techniques would usually miss. https://cloud.google.com/vertex-ai/docs/explainable-ai/overview
upvoted 4 times
...
b1a8fae
1 year, 3 months ago
Selected Answer: C
C. Example-based explanations make more sense in this case than feature based attributions (we want to understand with examples what kind of decisions the model takes; also explore the amount of bias in a visual, understandable way) https://cloud.google.com/vertex-ai/docs/explainable-ai/overview#example-based
upvoted 3 times
el_vampiro
2 months ago
Example based explanations make sense when there are two classes that are getting confused with each other. It does not help with the quesiton
upvoted 2 times
...
...
pikachu007
1 year, 4 months ago
Selected Answer: D
Feature-Level Insights: Feature attributions pinpoint which image regions contribute most to predictions, offering granular understanding of model reasoning. Bias Detection: Aggregating feature attributions over the entire dataset can reveal systematic biases or patterns of model behavior, helping identify potential fairness issues. Complementary to Evaluation Metrics: Combining attributions with standard metrics (e.g., accuracy, precision, recall) provides a comprehensive view of model performance and fairness.
upvoted 3 times
...

Topic 1 Question 236

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 236 discussion

You work for a large retailer, and you need to build a model to predict customer chum. The company has a dataset of historical customer data, including customer demographics purchase history, and website activity. You need to create the model in BigQuery ML and thoroughly evaluate its performance. What should you do?

  • A. Create a linear regression model in BigQuery ML, and register the model in Vertex AI Model Registry. Evaluate the model performance in Vertex AI .
  • B. Create a logistic regression model in BigQuery ML and register the model in Vertex AI Model Registry. Evaluate the model performance in Vertex AI .
  • C. Create a linear regression model in BigQuery ML. Use the ML.EVALUATE function to evaluate the model performance.
  • D. Create a logistic regression model in BigQuery ML. Use the ML.CONFUSION_MATRIX function to evaluate the model performance.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
gscharly
Highly Voted 1 year, 6 months ago
Selected Answer: B
logistic since it's classification, and Vertex AI because we need to "thoroughly evaluate its performance"
upvoted 5 times
...
Dirtie_Sinkie
Most Recent 1 year, 1 month ago
Selected Answer: B
B is the definitive answer. By breaking down the question we know it is a classification problem, so A and C are wrong since they're linear regression. Using confusion matrix to evaluate the model is not wrong (actually it's even the textbook answer to do it), but it is not enough if you want to thoroughly evaluate its performance. Hence the best way to do it is with Vertex AI.
upvoted 4 times
...
fitri001
1 year, 6 months ago
Selected Answer: B
Logistic Regression: While linear regression (option C) can be used for continuous prediction tasks, customer churn is a binary classification problem (churned/not churned). Logistic regression is a better fit for this scenario. Vertex AI Model Registry: Registering the model in Vertex AI Model Registry provides a centralized location for model management, versioning, and potentially future deployment to other Vertex AI services. Vertex AI Evaluation: Vertex AI offers more comprehensive evaluation tools than BigQuery ML's ML.EVALUATE function (option C) or ML.CONFUSION_MATRIX function (option D). Vertex AI can provide metrics like accuracy, ROC-AUC, precision, and recall, which are crucial for churn prediction evaluation.
upvoted 2 times
...
guilhermebutzke
1 year, 8 months ago
Selected Answer: B
My Answer: B predict customer churn, which is a binary classification problem (whether a customer will churn or not). And, the phrase "thoroughly evaluate its performance" does suggest a more comprehensive approach, and in that sense, option B could be seen as a better answer than D.
upvoted 2 times
...
guilhermebutzke
1 year, 8 months ago
My Answer: B predict customer churn, which is a binary classification problem (whether a customer will churn or not). And, the phrase "thoroughly evaluate its performance" does suggest a more comprehensive approach, and in that sense, option B could be seen as a better answer than D.
upvoted 1 times
...
BlehMaks
1 year, 9 months ago
Selected Answer: B
B because Vertex AI provides us with more functions to evaluate model performance then just CONFUSION_MATRIX https://cloud.google.com/vertex-ai/docs/evaluation/introduction#classification_1
upvoted 1 times
...
b1a8fae
1 year, 9 months ago
Selected Answer: B
B. Linear regression because customer churn is a number of customers (not just 1/0). The key here imo is "thoroughly evaluate performance", which Vertex AI seems to be better suited for than BQ (including the possibility of tracking experiment lineage, inspecting parameter selection of each run, etc)
upvoted 4 times
...
pikachu007
1 year, 10 months ago
Selected Answer: D
Customer churn prediction involves a binary classification task (whether a customer will churn or not). Logistic regression is specifically designed for this type of problem, making it the appropriate model. BigQuery ML allows building and training logistic regression models directly within BigQuery, leveraging its scalability and SQL-like syntax for model development.
upvoted 3 times
...

Topic 1 Question 237

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 237 discussion

You are developing a model to identify traffic signs in images extracted from videos taken from the dashboard of a vehicle. You have a dataset of 100,000 images that were cropped to show one out of ten different traffic signs. The images have been labeled accordingly for model training, and are stored in a Cloud Storage bucket. You need to be able to tune the model during each training run. How should you train the model?

  • A. Train a model for object detection by using Vertex AI AutoML.
  • B. Train a model for image classification by using Vertex AI AutoML.
  • C. Develop the model training code for object detection, and train a model by using Vertex AI custom training.
  • D. Develop the model training code for image classification, and train a model by using Vertex AI custom training.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
pikachu007
Highly Voted 1 year, 10 months ago
Selected Answer: D
Not A or B since automl doesnt provide you without flexibility to tune. Not C because object detection is not required since the images are cropped to a single traffic light
upvoted 17 times
...
kornick
Most Recent 10 months ago
Selected Answer: D
Not C because object detection is not required since the images are cropped to a single traffic light
upvoted 1 times
...
shaoshao
10 months, 3 weeks ago
Selected Answer: C
C. "identify" a small signal is an object detection task, not a classification task.
upvoted 1 times
...
Dirtie_Sinkie
1 year, 1 month ago
Selected Answer: D
Answer is D, not C in my opinion. Object detection might be used in a real-world project since there are a lot of variables which may affect the picture like visibility, colour fading, having other things in the picture like streetlights, birds, etc. Way too overkill for our question here. I'm assuming the pictures are already nicely cropped out with none of this extra stuff in the pictures.
upvoted 1 times
...
baimus
1 year, 1 month ago
Selected Answer: C
There is literally no way to know if this is C or D, as "labelled" and "identify street signs" are too ambiguous to know if its detection or classification. The "cropped to a single traffic light" seems like maybe D, but that's hardly ML knowledge, it's a pub quiz guess.
upvoted 2 times
...
gscharly
1 year, 6 months ago
Selected Answer: D
agree with pikachu007
upvoted 1 times
...
pinimichele01
1 year, 7 months ago
Selected Answer: D
Not C because object detection is not required since the images are cropped to a single traffic light
upvoted 1 times
...
guilhermebutzke
1 year, 8 months ago
Selected Answer: C
Correct: C The phrases "identify traffic signs in images extracted from videos" and "images that were cropped to show one out of ten different traffic signs" suggest that this is an image detection problem. The first phrase appears to have the same meaning as "images with," and the second phrase suggests that only one type of traffic sign was used in the problem, indicating that it cannot be used in a multi-class problem. For all these reasons, I believe the best option is C.
upvoted 3 times
...
guilhermebutzke
1 year, 8 months ago
My answer: C The phrases "identify traffic signs in images extracted from videos" and "images that were cropped to show one out of ten different traffic signs" suggest that this is an image detection problem. The first phrase appears to have the same meaning as "images with," and the second phrase suggests that only one type of traffic sign was used in the problem, indicating that it cannot be used in a multi-class problem. For all these reasons, I believe the best option is C.
upvoted 2 times
...

Topic 1 Question 238

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 238 discussion

You have deployed a scikit-team model to a Vertex AI endpoint using a custom model server. You enabled autoscaling: however, the deployed model fails to scale beyond one replica, which led to dropped requests. You notice that CPU utilization remains low even during periods of high load. What should you do?

  • A. Attach a GPU to the prediction nodes
  • B. Increase the number of workers in your model server
  • C. Schedule scaling of the nodes to match expected demand
  • D. Increase the minReplicaCount in your DeployedModel configuration
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
sonicclasps
Highly Voted 1 year, 9 months ago
Selected Answer: A
"We generally recommend starting with one worker or thread per core. If you notice that CPU utilization is low, especially under high load, or your model is not scaling up because CPU utilization is low, then increase the number of workers." https://cloud.google.com/vertex-ai/docs/general/deployment
upvoted 8 times
sonicclasps
1 year, 9 months ago
sorry clicked wrong, answer is B
upvoted 6 times
...
...
f084277
Most Recent 12 months ago
Selected Answer: B
B. One worker isn't enough to saturate the CPU and so no scaling is triggered.
upvoted 4 times
...
fitri001
1 year, 6 months ago
Selected Answer: B
agree with sonicclasps -> B
upvoted 2 times
...
pinimichele01
1 year, 7 months ago
Selected Answer: B
agree with sonicclasps -> B
upvoted 1 times
pinimichele01
1 year, 6 months ago
NOT D: This might help ensure at least one replica is always available, but it won't address the issue of not scaling up during high load.
upvoted 1 times
...
...
Carlose2108
1 year, 8 months ago
Selected Answer: B
I went B
upvoted 2 times
...
guilhermebutzke
1 year, 8 months ago
Selected Answer: C
My answer: C The problem is in scale. The provided resources areok. So, A: Not correct, because CPU is enough. B: Not correct, because increasing the number of workers will accelerate the process in a single replica, and make the time of prediction faster for example, but not will happen in scale problem. C:Correct: This option involves adjusting the scaling of resources to match the expected demand, ensuring that the system can handle increased loads effectively D: This might help ensure at least one replica is always available, but it won't address the issue of not scaling up during high load.
upvoted 1 times
...
pikachu007
1 year, 10 months ago
Selected Answer: B
Low CPU Utilization: Despite high load, low CPU utilization indicates underutilization of available resources, suggesting a bottleneck within the model server itself, not overall compute capacity. Worker Concurrency: Increasing the number of workers within the model server allows it to handle more concurrent requests, effectively utilizing available CPU resources and addressing the bottleneck.
upvoted 3 times
BlehMaks
1 year, 9 months ago
i don't get it. The autoscaling system should increase/decrease the number of workers itself. if we do it instead of the autoscaling system, why do we need it?
upvoted 1 times
...
guilhermebutzke
1 year, 8 months ago
Increase the number of workers within the model server will distribute the load within the single replica, but it wouldn't address the problem of not scaling beyond one replica. Increasin worker will be a good option for delay in prediction.
upvoted 1 times
asmgi
1 year, 3 months ago
Not scaling beyond one replica is symptom and not the source of the problem. The problem is low CPU utilization.
upvoted 1 times
...
...
...

Topic 1 Question 239

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 239 discussion

You work for a pet food company that manages an online forum. Customers upload photos of their pets on the forum to share with others. About 20 photos are uploaded daily. You want to automatically and in near real time detect whether each uploaded photo has an animal. You want to prioritize time and minimize cost of your application development and deployment. What should you do?

  • A. Send user-submitted images to the Cloud Vision API. Use object localization to identify all objects in the image and compare the results against a list of animals.
  • B. Download an object detection model from TensorFlow Hub. Deploy the model to a Vertex AI endpoint. Send new user-submitted images to the model endpoint to classify whether each photo has an animal.
  • C. Manually label previously submitted images with bounding boxes around any animals. Build an AutoML object detection model by using Vertex AI. Deploy the model to a Vertex AI endpoint Send new user-submitted images to your model endpoint to detect whether each photo has an animal.
  • D. Manually label previously submitted images as having animals or not. Create an image dataset on Vertex AI. Train a classification model by using Vertex AutoML to distinguish the two classes. Deploy the model to a Vertex AI endpoint. Send new user-submitted images to your model endpoint to classify whether each photo has an animal.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
b1a8fae
Highly Voted 1 year, 9 months ago
Selected Answer: A
A. B would also work and I wonder if cost would be lower, but I think going with the google hosted service is most times the most likely choice to be correct.
upvoted 11 times
Dagogi96
1 year, 9 months ago
I think the same, if the question mentions other services and gives you an alternative that Google has, obviously, the "best option" is Google, although I think the same, I think that a model downloaded from a HUB would possibly save us a few how many euros..
upvoted 4 times
...
louisaok
1 year ago
Agree. The main purpose for google certification is to Promote GCP services.
upvoted 3 times
...
...
rajshiv
Most Recent 11 months, 1 week ago
Selected Answer: D
Option D is optimal as it uses image classification with Vertex AutoML, which is simple to implement, cost-effective, and scalable.
upvoted 1 times
...
d6e1ae4
1 year, 2 months ago
Selected Answer: D
The labeling process is simpler than object detection, as it's just a binary classification. AutoML simplifies the model creation process, reducing development time. For the relatively low volume of images (20 per day), this solution is likely to be cost-effective in the long run. Why not A? Cloud Vision is overkill for a binary classification and it is very expensive.
upvoted 1 times
...
gscharly
1 year, 6 months ago
Selected Answer: A
agree with b1a8fae
upvoted 1 times
...
CHARLIE2108
1 year, 9 months ago
Selected Answer: B
I went Option B
upvoted 2 times
...
shadz10
1 year, 10 months ago
Selected Answer: A
As minimising time and cost are of priority and considering the small subset of images I believe A is the best option
upvoted 4 times
...

Topic 1 Question 240

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 240 discussion

You work at a mobile gaming startup that creates online multiplayer games. Recently, your company observed an increase in players cheating in the games, leading to a loss of revenue and a poor user experience You built a binary classification model to determine whether a player cheated after a completed game session, and then send a message to other downstream systems to ban the player that cheated. Your model has performed well during testing, and you now need to deploy the model to production. You want your serving solution to provide immediate classifications after a completed game session to avoid further loss of revenue. What should you do?

  • A. Import the model into Vertex AI Model Registry. Use the Vertex Batch Prediction service to run batch inference jobs.
  • B. Save the model files in a Cloud Storage bucket. Create a Cloud Function to read the model files and make online inference requests on the Cloud Function.
  • C. Save the model files in a VM. Load the model files each time there is a prediction request, and run an inference job on the VM
  • D. Import the model into Vertex AI Model Registry. Create a Vertex AI endpoint that hosts the model, and make online inference requests.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
guilhermebutzke
Highly Voted 1 year, 2 months ago
My answer: D A: Not correct: Batch Prediction is designed for offline processing of large datasets, not for immediate real-time predictions needed in this scenario. B: Not correct: While Cloud Functions offer real-time processing, loading the model files each time might introduce latency, especially for larger models C: Not correct: Using a VM is less scalable and more complex to manage compared to other options. D: CORRECT: Vertex AI Model Registry ensures proper model management, versioning, and access control while Vertex AI endpoint provides a highly scalable and managed solution for real-time online inference, ensuring immediate predictions after game sessions.
upvoted 7 times
...
fitri001
Highly Voted 1 year ago
Selected Answer: D
Low Latency: Vertex AI Endpoints are specifically designed for low-latency online inference. They offer automatic scaling and efficient resource allocation, ensuring quick responses to game session completion signals. Real-time Decisions: This deployment method allows your game backend to send data from finished game sessions to the Vertex AI endpoint in near real-time. The endpoint can then make classifications (cheater or not cheater) promptly. Managed Service: Vertex AI handles the infrastructure management and scaling of your model, freeing you from managing servers or virtual machines (VMs).
upvoted 6 times
fitri001
1 year ago
A. Vertex Batch Prediction: Batch prediction is designed for offline processing of large datasets, not real-time inference on individual game sessions. B. Cloud Function with Model Files: While Cloud Functions can be triggered by events, reading the model files each time and running inference can introduce latency. This might not be ideal for immediate classifications. C. Model Files in a VM: Loading the model on a VM for each inference request incurs significant overhead and latency. This approach is not suitable for real-time processing.
upvoted 3 times
...
...
pikachu007
Most Recent 1 year, 4 months ago
Selected Answer: D
Option A: Batch prediction is too slow for your needs. Option B: Cloud Functions are ideal for short-lived tasks, not for continuously serving models. Loading the model on every request would be inefficient. Option C: VMs offer less scalability and management overhead compared to Vertex AI.
upvoted 3 times
sonicclasps
1 year, 3 months ago
although the game is multiplayer, and you could submit requests for all the players in the game that just ended, as a batch. So I think A is also an option
upvoted 1 times
...
...

Topic 1 Question 241

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 241 discussion

You have created a Vertex AI pipeline that automates custom model training. You want to add a pipeline component that enables your team to most easily collaborate when running different executions and comparing metrics both visually and programmatically. What should you do?

  • A. Add a component to the Vertex AI pipeline that logs metrics to a BigQuery table. Query the table to compare different executions of the pipeline. Connect BigQuery to Looker Studio to visualize metrics.
  • B. Add a component to the Vertex AI pipeline that logs metrics to a BigQuery table. Load the table into a pandas DataFrame to compare different executions of the pipeline. Use Matplotlib to visualize metrics.
  • C. Add a component to the Vertex AI pipeline that logs metrics to Vertex ML Metadata. Use Vertex AI Experiments to compare different executions of the pipeline. Use Vertex AI TensorBoard to visualize metrics.
  • D. Add a component to the Vertex AI pipeline that logs metrics to Vertex ML Metadata. Load the Vertex ML Metadata into a pandas DataFrame to compare different executions of the pipeline. Use Matplotlib to visualize metrics.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
gscharly
Highly Voted 1 year, 6 months ago
Selected Answer: C
went with C. Experiments can be used to compare executions and metrics
upvoted 6 times
...
Omi_04040
Most Recent 11 months ago
Selected Answer: C
Vertex ML Metadata and Vertex AI Experiments provide APIs and SDKs that allow you to access and analyze metrics programmatically. This enables you to automate comparisons, generate reports, or perform custom analysis on your pipeline executions.
upvoted 1 times
...
baimus
1 year, 1 month ago
I can see why C is tempting, but Vertex Experiment's isn't actually required here, just a nice to have, whereas Workbench is actually required as they say "visually AND programatically". It's literally the only answer that allows programmatic comparison of the data in the metadata store.
upvoted 1 times
...
fitri001
1 year, 6 months ago
Selected Answer: A
Why A? BigQuery: Stores pipeline metrics from different executions in a central location, allowing easy access for team members. BigQuery Queries: Enables programmatic comparison of metrics across runs using SQL queries. Looker Studio: Provides a collaborative visualization platform for team members to explore and compare metrics visually. why not C? Vertex AI Experiments and TensorBoard: While Vertex AI Experiments can leverage ML Metadata for lineage tracking, it's not ideal for general metric comparison. TensorBoard is primarily for visualizing training data during the pipeline execution, not comparing results across runs.
upvoted 4 times
pinimichele01
1 year, 6 months ago
why log on BQ and not to MetadataAI?
upvoted 1 times
...
asmgi
1 year, 3 months ago
Isn't BQ too much for a dozen of metrics?
upvoted 1 times
...
...
b1a8fae
1 year, 9 months ago
Selected Answer: C
Clearly C.
upvoted 3 times
...
winston9
1 year, 10 months ago
Selected Answer: C
C is the correct one here
upvoted 2 times
...

Topic 1 Question 242

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 242 discussion

Your team is training a large number of ML models that use different algorithms, parameters, and datasets. Some models are trained in Vertex AI Pipelines, and some are trained on Vertex AI Workbench notebook instances. Your team wants to compare the performance of the models across both services. You want to minimize the effort required to store the parameters and metrics. What should you do?

  • A. Implement an additional step for all the models running in pipelines and notebooks to export parameters and metrics to BigQuery.
  • B. Create a Vertex AI experiment. Submit all the pipelines as experiment runs. For models trained on notebooks log parameters and metrics by using the Vertex AI SDK.
  • C. Implement all models in Vertex AI Pipelines Create a Vertex AI experiment, and associate all pipeline runs with that experiment.
  • D. Store all model parameters and metrics as model metadata by using the Vertex AI Metadata API.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
fitri001
1 year ago
Selected Answer: B
Why B? Centralized Tracking: Vertex AI Experiments provides a central location to track and compare models trained in both pipelines and notebooks. Reduced Overhead: Submitting pipelines as experiment runs leverages the existing pipeline infrastructure for logging and avoids creating additional pipeline steps for all models. Notebook Integration: Vertex AI SDK allows notebooks to log parameters and metrics directly to the experiment, simplifying data collection from notebooks. why not C? C. All Models in Pipelines: Moving all models to pipelines might not be feasible or desirable. Pipelines are best suited for automated, repeatable training, while notebooks offer flexibility for exploration.
upvoted 4 times
...
omermahgoub
1 year ago
Selected Answer: B
B. Create a Vertex AI experiment. Submit all the pipelines as experiment runs. For models trained on notebooks log parameters and metrics by using the Vertex AI SDK.
upvoted 2 times
...
guilhermebutzke
1 year, 2 months ago
Selected Answer: B
My Answer: B A: Not Correct: Not the best approach compared with Vertex AI experiment that does the same B: CORRECT: By submitting all pipelines as experiment runs, you can centralize the storage of parameters and metrics for models trained in Vertex AI Pipelines. This approach minimizes effort by providing a unified platform for storing and comparing model performance across different services. C: Not Correct: not feasible or ideal for models trained on Vertex AI Workbench notebook instances. D: Not Correct: If only basic parameter and metric storage is needed, and your team prioritizes simplicity over in-depth comparison, option D could be an alternative. For more complex scenarios requiring comprehensive analysis and comparison across diverse models, option B with Vertex AI Experiments
upvoted 4 times
...
b1a8fae
1 year, 3 months ago
Selected Answer: B
Divided between B and C. But logging parameters of models sounds easier than re-implementing a large amount of models as Vertex AI pipelines.
upvoted 3 times
...
shadz10
1 year, 3 months ago
Selected Answer: B
B is The correct answer here I believe - Vertex AI experiments - provides a unified way to store and compare model runs. pipeline runs - It provides a unified way to store and compare model runs. notebook instances - models trained on Vertex AI Workbench notebook instances, logging parameters and metrics using the Vertex AI SDK provides a consistent way to record the necessary information.
upvoted 1 times
...
pikachu007
1 year, 4 months ago
Selected Answer: C
Options A and B: Logging metrics to BigQuery involves additional setup and integration efforts. Option D: Loading Vertex ML Metadata into a pandas DataFrame for visualization requires manual work and doesn't leverage built-in visualization tools.
upvoted 1 times
felipepin
1 year, 2 months ago
On option B there are no Logging metrics to BigQuery suggested. Hence why B is correct.
upvoted 2 times
...
...

Topic 1 Question 243

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 243 discussion

You work on a team that builds state-of-the-art deep learning models by using the TensorFlow framework. Your team runs multiple ML experiments each week, which makes it difficult to track the experiment runs. You want a simple approach to effectively track, visualize, and debug ML experiment runs on Google Cloud while minimizing any overhead code. How should you proceed?

  • A. Set up Vertex AI Experiments to track metrics and parameters. Configure Vertex AI TensorBoard for visualization.
  • B. Set up a Cloud Function to write and save metrics files to a Cloud Storage bucket. Configure a Google Cloud VM to host TensorBoard locally for visualization.
  • C. Set up a Vertex AI Workbench notebook instance. Use the instance to save metrics data in a Cloud Storage bucket and to host TensorBoard locally for visualization.
  • D. Set up a Cloud Function to write and save metrics files to a BigQuery table. Configure a Google Cloud VM to host TensorBoard locally for visualization.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
b1a8fae
Highly Voted 1 year, 3 months ago
Selected Answer: A
You want to run, track, visualize ML experiments -> look no further, Vertex AI experiments.
upvoted 7 times
...
nnn245bbb
Most Recent 6 months ago
Selected Answer: A
Based on ChatGPT
upvoted 1 times
...
fitri001
1 year ago
Selected Answer: A
Built-in Tracking: Vertex AI Experiments is specifically designed for tracking ML experiments on Google Cloud. It simplifies logging metrics and parameters, eliminating the need for custom code. TensorBoard Integration: Vertex AI integrates with TensorBoard, allowing visualization of training logs and metrics directly within the Experiments interface. This provides a centralized location for both tracking and visualization. Minimized Overhead: This approach leverages existing services, minimizing the need for additional code or infrastructure setup compared to options with Cloud Functions or VMs.
upvoted 2 times
...
pikachu007
1 year, 4 months ago
Selected Answer: A
Options B and D: These options involve more setup and maintenance overhead, as they require managing Cloud Functions, VMs, and storage resources. Option C: Vertex AI Workbench is excellent for interactive experimentation, but it's not optimized for long-term experiment tracking and visualization.
upvoted 3 times
...

Topic 1 Question 244

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 244 discussion

Your work for a textile manufacturing company. Your company has hundreds of machines, and each machine has many sensors. Your team used the sensory data to build hundreds of ML models that detect machine anomalies. Models are retrained daily, and you need to deploy these models in a cost-effective way. The models must operate 24/7 without downtime and make sub millisecond predictions. What should you do?

  • A. Deploy a Dataflow batch pipeline and a Vertex AI Prediction endpoint.
  • B. Deploy a Dataflow batch pipeline with the Runlnference API, and use model refresh.
  • C. Deploy a Dataflow streaming pipeline and a Vertex AI Prediction endpoint with autoscaling.
  • D. Deploy a Dataflow streaming pipeline with the Runlnference API, and use automatic model refresh.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
fitri001
Highly Voted 1 year, 6 months ago
Selected Answer: D
why D? Real-time Predictions: Dataflow streaming pipelines continuously process sensor data, enabling real-time anomaly detection with sub-millisecond predictions. This is crucial for immediate response to potential machine issues. RunInference API: This API allows invoking TensorFlow models directly within the Dataflow pipeline for on-the-fly inference. This eliminates the need for separate prediction endpoints and reduces latency. Automatic Model Refresh: Since models are retrained daily, automatic refresh ensures the pipeline utilizes the latest version without downtime. This is essential for maintaining model accuracy and anomaly detection effectiveness. Why not C? Dataflow Streaming Pipeline with Vertex AI Prediction Endpoint with Autoscaling: While autoscaling can handle varying workloads, Vertex AI Prediction endpoints might incur higher costs for real-time, high-volume predictions compared to invoking models directly within the pipeline using RunInference.
upvoted 9 times
...
gscharly
Most Recent 1 year, 6 months ago
Selected Answer: D
agree with fitri001
upvoted 1 times
...
pinimichele01
1 year, 7 months ago
Selected Answer: D
With the automatic model refresh feature, when the underlying model changes, your pipeline updates to use the new model. Because the RunInference transform automatically updates the model handler, you don't need to redeploy the pipeline. With this feature, you can update your model in real time, even while the Apache Beam pipeline is running.
upvoted 2 times
pinimichele01
1 year, 7 months ago
and also ai endpoint not good for online inference
upvoted 1 times
...
...
guilhermebutzke
1 year, 8 months ago
Selected Answer: C
My Answer: C The phrase: “The models must operate 24/7 without downtime and make sub millisecond predictions” configures a case of online prediction (option B or C) The phrase: “Models are retrained daily, and you need to deploy these models in a cost-effective way”, choose between “ Vertex AI Prediction endpoint with autoscaling” instead “Runlnference API, and use automatic model refresh” looks better because always update with retrained models, and the scalability. https://cloud.google.com/blog/products/ai-machine-learning/streaming-prediction-with-dataflow-and-vertex
upvoted 4 times
...
sonicclasps
1 year, 9 months ago
Selected Answer: C
low latency - > streaming C & D could both work, but C is the GCP solution. So I chose C
upvoted 3 times
vaibavi
1 year, 9 months ago
i think autoscaling will lead to downtime atleast when the replicas are updating .
upvoted 2 times
pinimichele01
1 year, 6 months ago
i agree, D is better
upvoted 1 times
...
...
asmgi
1 year, 3 months ago
I don't think autoscaling is relevant to this task, since we have the same amount of sensors at any time.
upvoted 1 times
...
...
b1a8fae
1 year, 9 months ago
Selected Answer: D
Needs to be active 24/7 -> streaming. RunInference API seems like the way to go here, using automatic model refresh on a daily basis. https://beam.apache.org/documentation/ml/about-ml/
upvoted 4 times
...

Topic 1 Question 245

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 245 discussion

You are developing an ML model that predicts the cost of used automobiles based on data such as location, condition, model type, color, and engine/battery efficiency. The data is updated every night. Car dealerships will use the model to determine appropriate car prices. You created a Vertex AI pipeline that reads the data splits the data into training/evaluation/test sets performs feature engineering trains the model by using the training dataset and validates the model by using the evaluation dataset. You need to configure a retraining workflow that minimizes cost. What should you do?

  • A. Compare the training and evaluation losses of the current run. If the losses are similar, deploy the model to a Vertex AI endpoint. Configure a cron job to redeploy the pipeline every night.
  • B. Compare the training and evaluation losses of the current run. If the losses are similar, deploy the model to a Vertex AI endpoint with training/serving skew threshold model monitoring. When the model monitoring threshold is triggered redeploy the pipeline.
  • C. Compare the results to the evaluation results from a previous run. If the performance improved deploy the model to a Vertex AI endpoint. Configure a cron job to redeploy the pipeline every night.
  • D. Compare the results to the evaluation results from a previous run. If the performance improved deploy the model to a Vertex AI endpoint with training/serving skew threshold model monitoring. When the model monitoring threshold is triggered redeploy the pipeline.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
fitri001
Highly Voted 1 year ago
Selected Answer: D
Since the goal is to minimize cost while maintaining accuracy, Option D provides a more targeted approach for retraining based on the likelihood of the model being outdated due to data changes. Option B might trigger retraining more frequently even if the performance difference doesn't necessarily stem from a significant shift in the data distribution.
upvoted 7 times
fitri001
1 year ago
Option D: Utilizes training/serving skew monitoring. This specifically focuses on identifying discrepancies between the training data and the real-world data the deployed model encounters. This is a strong indicator of when the model might be outdated due to changes in the data distribution. Option B: Utilizes training/serving loss monitoring. Training loss tells you how well the model performs on the training data, while serving loss tells you how well it performs on real-world data. While high serving loss can indicate a problem, it might not necessarily be due to training/serving skew. Other factors like data quality issues or concept drift (gradual changes in the underlying data patterns) could also lead to high serving loss.
upvoted 3 times
...
...
guilhermebutzke
Highly Voted 1 year, 2 months ago
Selected Answer: D
My answer D: A and C: Not Correct: Schedule a retrain every night is not necessary since the model is performing well. B. Not Correct: This approach focuses on internal consistency within the current training run, train versus loss evaluation. Comparing similar training and validation losses doesn't guarantee better performance than previous models. This is an approach to identity overfitting, for example, or model quality. D. Correct: This approach focuses on identifying performance changes over time. Comparing to previous runs helps assess if the new model performs better than the old one on the evaluation set. we will check if this new version is better or not than the old one https://www.youtube.com/watch?v=1ykDWsnL2LE&ab_channel=GoogleCloudTech
upvoted 7 times
...
omermahgoub
Most Recent 1 year ago
Selected Answer: D
D. Compare the results to the evaluation results from a previous run. If the performance improved, deploy the model to a Vertex AI endpoint with training/serving skew threshold model monitoring. When the model monitoring threshold is triggered, redeploy the pipeline.
upvoted 3 times
pinimichele01
1 year ago
i agree, see guilhermebutzke
upvoted 1 times
...
...
pikachu007
1 year, 4 months ago
Selected Answer: B
Option A: Redeploying the pipeline every night without checking for degradation wastes resources if model performance is stable. Option C: Comparing results to a previous run doesn't guarantee model degradation detection in the current run. Option D: Comparing to a previous run and using model monitoring is redundant; model monitoring alone is sufficient.
upvoted 5 times
...

Topic 1 Question 246

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 246 discussion

You recently used BigQuery ML to train an AutoML regression model. You shared results with your team and received positive feedback. You need to deploy your model for online prediction as quickly as possible. What should you do?

  • A. Retrain the model by using BigQuery ML, and specify Vertex AI as the model registry. Deploy the model from Vertex AI Model Registry to a Vertex AI endpoint,
  • B. Retrain the model by using Vertex Al Deploy the model from Vertex AI Model. Registry to a Vertex AI endpoint.
  • C. Alter the model by using BigQuery ML, and specify Vertex AI as the model registry. Deploy the model from Vertex AI Model Registry to a Vertex AI endpoint.
  • D. Export the model from BigQuery ML to Cloud Storage. Import the model into Vertex AI Model Registry. Deploy the model to a Vertex AI endpoint.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
cruise93
Highly Voted 1 year, 6 months ago
Selected Answer: C
https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-alter-model#alter_model_statement
upvoted 8 times
...
pikachu007
Highly Voted 1 year, 10 months ago
Selected Answer: D
I think it's D, as model retraining should not be required unless it's specified there's new data.
upvoted 7 times
shadz10
1 year, 10 months ago
I agree with pikachu007
upvoted 2 times
vaibavi
1 year, 9 months ago
I think it's C Exported models for model types AUTOML_REGRESSOR and AUTOML_CLASSIFIER do not support AI Platform deployment for online prediction.
upvoted 5 times
...
...
daidai75
1 year, 9 months ago
Agree with Pikachu007, the option D is good.
upvoted 1 times
...
Dagogi96
1 year, 9 months ago
Friend is the C, and with Alter MODEL you can register the model in Vertex AI, I work in a company and I myself have registered models like this.
upvoted 7 times
...
...
bigdapper
Most Recent 2 months, 1 week ago
Selected Answer: C
Ans: C (ALTER MODEL) https://cloud.google.com/bigquery/docs/managing-models-vertex#sql
upvoted 2 times
...
phani49
10 months, 3 weeks ago
Selected Answer: C
You can use the ALTER MODEL statement to register your existing BigQuery ML model with Vertex AI Model Registry https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-alter-model
upvoted 4 times
...
Omi_04040
11 months ago
Selected Answer: C
No need to export the model to Cloud Storage
upvoted 1 times
...
lunalongo
11 months, 3 weeks ago
Selected Answer: C
C is the best option because: 1) Retraining the model (A/B) is not necessary (see positive feedbacks) 2) Exporting to Cloud Storage (D) is not necessary, since you can use the ALTER MODEL statement to register it on Vertex AI Model Registry and deploy it to the Vertex AI endpoint from there 3) Using BigQuery ML without exporting the model is the quickiest option
upvoted 3 times
...
lunalongo
11 months, 3 weeks ago
D is the best option because: 1) BigQuery ML is excellent for training, not much for online prediction 2) Vertex AI provides a more robust and scalable infrastructure. 3) Exporting model from BigQuery ML to a format compatible with Vertex AI (typically Cloud Storage) is required ***A, B, and C attempt to directly deploy from BigQuery ML, which isn't a supported workflow.
upvoted 1 times
...
Land3r
11 months, 3 weeks ago
Selected Answer: C
https://cloud.google.com/bigquery/docs/managing-models-vertex#register-new-bqml-model-version
upvoted 1 times
...
hybridpro
1 year, 1 month ago
Selected Answer: C
It's C
upvoted 1 times
...
d6e1ae4
1 year, 2 months ago
Selected Answer: D
The model has already been trained and received positive feedback, so there's no need to retrain the model.
upvoted 1 times
...
AzureDP900
1 year, 4 months ago
C is correct Here's why: 1) You trained an AutoML regression model using BigQuery ML. 2)To deploy the model for online prediction, you need to export the model in a format that is compatible with Vertex AI. 3)Altering the model by using BigQuery ML and specifying Vertex AI as the model registry allows you to export the model in the correct format. Once exported, you can deploy the model from Vertex AI Model Registry to a Vertex AI endpoint, which enables online prediction
upvoted 1 times
...
AzureDP900
1 year, 4 months ago
C is correct Here's why: 1) You trained an AutoML regression model using BigQuery ML. 2)To deploy the model for online prediction, you need to export the model in a format that is compatible with Vertex AI. 3)Altering the model by using BigQuery ML and specifying Vertex AI as the model registry allows you to export the model in the correct format. Once exported, you can deploy the model from Vertex AI Model Registry to a Vertex AI endpoint, which enables online prediction
upvoted 1 times
...
gscharly
1 year, 6 months ago
Selected Answer: C
https://cloud.google.com/vertex-ai/docs/model-registry/model-registry-bqml https://cloud.google.com/bigquery/docs/update_vertex
upvoted 3 times
...
fitri001
1 year, 6 months ago
Selected Answer: D
You recently used BigQuery ML to train an AutoML regression model. You shared results with your team and received positive feedback. You need to deploy your model for online prediction as quickly as possible. What should you do? A. Retrain the model by using BigQuery ML, and specify Vertex AI as the model registry. Deploy the model from Vertex AI Model Registry to a Vertex AI endpoint, B. Retrain the model by using Vertex Al Deploy the model from Vertex AI Model. Registry to a Vertex AI endpoint. C. Alter the model by using BigQuery ML, and specify Vertex AI as the model registry. Deploy the model from Vertex AI Model Registry to a Vertex AI endpoint. D. Export the model from BigQuery ML to Cloud Storage. Import the model into Vertex AI Model Registry. Deploy the model to a Vertex AI endpoint.
upvoted 2 times
fitri001
1 year, 6 months ago
No Retraining: You've already trained a successful model in BigQuery ML. Retraining (Options A, B, and C) is unnecessary and adds time. Direct Deployment: Option D leverages existing tools for streamlined deployment. You export the model directly from BigQuery ML and import it into Vertex AI Model Registry for centralized management. Finally, you deploy the model to a Vertex AI endpoint for online predictions. Cloud Storage: Cloud Storage provides a readily accessible location to store your exported model before deployment.
upvoted 1 times
pinimichele01
1 year, 6 months ago
alter the model doesn't mean retrain...
upvoted 3 times
...
...
...
omermahgoub
1 year, 7 months ago
Selected Answer: D
D. Export the model from BigQuery ML to Cloud Storage. Import the model into Vertex AI Model Registry. Deploy the model to a Vertex AI endpoint.
upvoted 1 times
pinimichele01
1 year, 6 months ago
why not C? it is not necessary to export in GCS
upvoted 1 times
omermahgoub
1 year, 6 months ago
I changed my answer to C. GCS is not necessary
upvoted 2 times
...
...
...
playerXL7
1 year, 7 months ago
Selected Answer: C
https://cloud.google.com/vertex-ai/docs/model-registry/model-registry-bqml
upvoted 1 times
...
alfieroy16
1 year, 7 months ago
Selected Answer: C
Alter the model is correct,no need to export the model : "You can register BigQuery ML models with the Model Registry, in order to manage them alongside your other ML models without needing to export them" https://cloud.google.com/bigquery/docs/managing-models-vertex a simple update is sufficient : https://cloud.google.com/bigquery/docs/update_vertex
upvoted 1 times
...

Topic 1 Question 247

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 247 discussion

You built a deep learning-based image classification model by using on-premises data. You want to use Vertex AI to deploy the model to production. Due to security concerns, you cannot move your data to the cloud. You are aware that the input data distribution might change over time. You need to detect model performance changes in production. What should you do?

  • A. Use Vertex Explainable AI for model explainability. Configure feature-based explanations.
  • B. Use Vertex Explainable AI for model explainability. Configure example-based explanations.
  • C. Create a Vertex AI Model Monitoring job. Enable training-serving skew detection for your model.
  • D. Create a Vertex AI Model Monitoring job. Enable feature attribution skew and drift detection for your model.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
b1a8fae
Highly Voted 1 year, 9 months ago
Selected Answer: D
D. You want to control how much the distribution of the data changes over time -> that's drift.
upvoted 9 times
...
sonicclasps
Highly Voted 1 year, 9 months ago
Selected Answer: D
the answer cannot be C, cause your training data is not available in production. So D is the only viable answer
upvoted 8 times
...
OpenKnowledge
Most Recent 1 month, 2 weeks ago
Selected Answer: D
D is the best answer among all the options here. Drift detection will indicate the model performance. Now, the option D is also enabling feature attribution skew detection; the needs to access the on-premise data. The way to get access to on-premise data is to Integrate on-premise training data with Vertex AI, which typically involves migrating your data to Google Cloud (cannot be used for thus use case) or establishing a secure, private connection or use Google Distributed Cloud (GDC) for on-premise solution (expensive and complex solution compared to the secured private connectivity solution). For secured private connectivity with on-premise solution, you can establish a secure private connection between your on-premise network and Google Cloud using Private Service Connect (PSC) with a Cloud VPN or Cloud. You use PSC to create a private endpoint for accessing Vertex AI services from your on-premise network.
upvoted 1 times
...
rajshiv
11 months, 2 weeks ago
Selected Answer: C
Option D is incorrect in my opinion. "Feature attribution skew and drift detection" focus on knowing how feature values are contributing to the model’s predictions, but "training-serving skew" is more direct and efficient to detect distribution changes that could lead to performance issues
upvoted 3 times
...
gscharly
1 year, 6 months ago
Selected Answer: D
D, as the training data is not available
upvoted 2 times
...
fitri001
1 year, 6 months ago
Selected Answer: D
Security: Vertex AI Model Monitoring doesn't require uploading your training data to the cloud. It analyzes model predictions and input features on your on-premises server. Data Distribution Shifts: Feature attribution techniques like LIME or SHAP within Vertex AI Model Monitoring can identify how different features contribute to model predictions. Detecting drifts in these feature attributions can indicate changes in the underlying data distribution compared to the training data.
upvoted 1 times
...
omermahgoub
1 year, 7 months ago
Selected Answer: C
Feature Attribution Skew and Drift Detection, this type of monitoring is useful in some cases, it requires access to the training and serving data for analysis. Since data cannot move to the cloud, Option D wouldn't be feasible. I vote for C. Create a Vertex AI Model Monitoring job. Enable training-serving skew detection for your model.
upvoted 1 times
omermahgoub
1 year, 6 months ago
I changed my answer to D
upvoted 2 times
...
...
pinimichele01
1 year, 7 months ago
Selected Answer: D
the answer cannot be C, cause your training data is not available in production.
upvoted 2 times
...
pikachu007
1 year, 10 months ago
Selected Answer: C
Option A and B: Vertex Explainable AI provides insights into model behavior but doesn't directly detect performance changes or concept drift. It's more suitable for understanding model decisions, not monitoring production performance. Option D: Feature attribution skew and drift detection requires feature attributions calculated during training, which might not be feasible without cloud access to the data.
upvoted 1 times
...

Topic 1 Question 248

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 248 discussion

You trained a model packaged it with a custom Docker container for serving, and deployed it to Vertex AI Model Registry. When you submit a batch prediction job, it fails with this error: "Error model server never became ready. Please validate that your model file or container configuration are valid. " There are no additional errors in the logs. What should you do?

  • A. Add a logging configuration to your application to emit logs to Cloud Logging
  • B. Change the HTTP port in your model’s configuration to the default value of 8080
  • C. Change the healthRoute value in your model’s configuration to /healthcheck
  • D. Pull the Docker image locally, and use the docker run command to launch it locally. Use the docker logs command to explore the error logs
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
wences
1 year, 1 month ago
Selected Answer: B
From StackOverflow:"Validate the container configuration port; it should use port 8080. This configuration is important because Vertex AI sends liveness checks, health checks, and prediction requests to this port on the container. " Pulling the container to the local machine is like stepping back and saying, "It works on my computer," then solving the problem as it arises.
upvoted 1 times
...
fitri001
1 year, 6 months ago
Selected Answer: D
Isolating the Issue: Running the container locally helps determine if the problem originates from the container configuration or the Vertex AI deployment environment. If the container runs successfully locally, the issue likely lies with Vertex AI. Detailed Error Messages: Examining the container logs using docker logs provides detailed error messages specific to the container startup process. These messages can pinpoint the root cause of the model server failure, such as missing dependencies, incorrect model format, or resource limitations.
upvoted 3 times
...
omermahgoub
1 year, 7 months ago
Selected Answer: D
I vote for D. Pull the Docker image locally, and use the docker run command to launch it locally. Use the docker logs command to explore the error logs. Here's why: 1. Local Testing by running the Docker image locally to replicate the environment the model server encounters within Vertex AI. 2. Using docker logs allows to inspect the detailed error messages generated by the model server during startup. These logs might provide specific clues about the cause of the "model server never became ready" error.
upvoted 3 times
...
CMMC
1 year, 8 months ago
Selected Answer: B
When deploying a custom container to Vertex AI Model Registry, need to follow some requirements for the container configuration. One of these requirements is to use the HTTP port 8080 forserving predictions. If using a different port, the model server might not be able to communicate with Vertex AI and cause the error “Error model server never became ready”. To fix this error, change the HTTP port in your model’s configuration to the default value of 8080 and redeploy the container.
upvoted 1 times
...
guilhermebutzke
1 year, 8 months ago
Selected Answer: D
My Answer: D A: Not correct: While logging can be helpful for monitoring and debugging, it won't directly address the issue of the model server not becoming ready. B: Not correct: The error message doesn't indicate a port issue, changing it preemptively might not resolve the underlying problem. C: Not correct: changing the health route, which could be helpful if the issue is related to health checks, but without further information, it's not the most conclusive option. D: CORRECT: This option allows you to simulate the deployment environment locally and inspect the logs directly, which can help diagnose the issue with the model server not becoming ready.
upvoted 3 times
...
Yan_X
1 year, 9 months ago
Selected Answer: C
Due to Model size or other reasons so that it cannot pass health check before timeout. https://cloud.google.com/knowledge/kb/unable-to-deploy-a-large-model-into-a-vertex-endpoint-000010439
upvoted 1 times
Yan_X
1 year, 8 months ago
I would revise my answer to D, as healthRoute should be defaulted to /healthcheck.
upvoted 2 times
...
...
vaibavi
1 year, 9 months ago
Selected Answer: B
Validate the container configuration port, it should use port 8080. This configuration is important because Vertex AI sends liveness checks, health checks, and prediction requests to this port on the container. https://www.appsloveworld.com/coding/flask/15/vertex-ai-deployment-failed
upvoted 1 times
...
sonicclasps
1 year, 9 months ago
Selected Answer: C
when not specifying the health check, the endpoint uses a default health check which only indicates if the http server is ready, not if the model is ready. https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements#health
upvoted 2 times
...
pikachu007
1 year, 10 months ago
Selected Answer: D
Option A: Adding logging to Cloud Logging is useful for long-term monitoring but might not provide immediate insights for this specific error. Options B and C: Changing port and health check configuration might be necessary if incorrect, but local debugging often reveals the root cause more effectively.
upvoted 4 times
...

Topic 1 Question 249

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 249 discussion

You are developing an ML model to identify your company’s products in images. You have access to over one million images in a Cloud Storage bucket. You plan to experiment with different TensorFlow models by using Vertex AI Training. You need to read images at scale during training while minimizing data I/O bottlenecks. What should you do?

  • A. Load the images directly into the Vertex AI compute nodes by using Cloud Storage FUSE. Read the images by using the tf.data.Dataset.from_tensor_slices function
  • B. Create a Vertex AI managed dataset from your image data. Access the AIP_TRAINING_DATA_URI environment variable to read the images by using the tf.data.Dataset.list_files function.
  • C. Convert the images to TFRecords and store them in a Cloud Storage bucket. Read the TFRecords by using the tf.data.TFRecordDataset function.
  • D. Store the URLs of the images in a CSV file. Read the file by using the tf.data.experimental.CsvDataset function.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
pikachu007
Highly Voted 1 year, 4 months ago
Selected Answer: C
Option A: Cloud Storage FUSE can be slower for large datasets and adds complexity. Option B: Vertex AI managed datasets offer convenience but might not match TFRecord performance for large-scale image training. Option D: CSV files require manual loading and parsing, increasing overhead.
upvoted 5 times
...
tavva_prudhvi
Highly Voted 12 months ago
Selected Answer: C
TFRecords is a binary storage format optimized for TensorFlow. By storing images as TFRecords, you can improve the I/O efficiency as the data is serialized and can be efficiently loaded off-disk in a batched manner. TFRecordDataset is specifically designed for reading these files efficiently, which helps in minimizing I/O bottlenecks. This approach is typically recommended for large-scale image datasets as it ensures data is read efficiently in a manner suitable for distributed training.
upvoted 5 times
...
gscharly
Most Recent 1 year ago
Selected Answer: C
agree with pikachu007
upvoted 1 times
...
fitri001
1 year ago
Selected Answer: A
Read the images by using the tf.data.Dataset.from_tensor_slices function. Here's why this option is most efficient: Cloud Storage FUSE: This mounts your Cloud Storage bucket directly to the training VM, allowing on-demand access to image data as local files. It minimizes network overhead and data transfer compared to downloading the entire dataset beforehand. tf.data.Dataset.from_tensor_slices: This function is suitable for reading data directly from memory. Since Cloud Storage FUSE presents the images as local files, you can leverage this function for efficient data access within your training script.
upvoted 1 times
fitri001
1 year ago
B. Vertex AI Managed Dataset: While managed datasets offer convenience, accessing them might involve additional network overhead compared to Cloud Storage FUSE. C. TFRecords: Converting images to TFRecords can be an additional processing step, potentially introducing I/O overhead. While TFRecord format might be efficient for some models, it's not strictly necessary for minimizing I/O during data access. D. CSV with Image URLs: Reading image URLs from a CSV and fetching each image individually creates significant network traffic, leading to I/O bottlenecks. It's less efficient than directly accessing the images through Cloud Storage FUSE.
upvoted 1 times
fitri001
1 year ago
TensorFlow Datasets (TFDs): Consider implementing TFDs within your training script. They offer functionalities like parallelized data loading and on-the-fly data augmentation to further optimize training efficiency. Preprocessing and Caching: Preprocess data (resizing, normalization) within your TFD pipeline or training script. Cache preprocessed data locally on the VM to avoid redundant processing during training iterations.
upvoted 1 times
...
...
...
felipepin
1 year, 2 months ago
Selected Answer: C
The TFRecord format is a simple format for storing a sequence of binary records. Protocol buffers are a cross-platform, cross-language library for efficient serialization of structured data.
upvoted 2 times
...

Topic 1 Question 250

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 250 discussion

You work at an ecommerce startup. You need to create a customer churn prediction model. Your company’s recent sales records are stored in a BigQuery table. You want to understand how your initial model is making predictions. You also want to iterate on the model as quickly as possible while minimizing cost. How should you build your first model?

  • A. Export the data to a Cloud Storage bucket. Load the data into a pandas DataFrame on Vertex AI Workbench and train a logistic regression model with scikit-learn.
  • B. Create a tf.data.Dataset by using the TensorFlow BigQueryClient. Implement a deep neural network in TensorFlow.
  • C. Prepare the data in BigQuery and associate the data with a Vertex AI dataset. Create an AutoMLTabularTrainingJob to tram a classification model.
  • D. Export the data to a Cloud Storage bucket. Create a tf.data.Dataset to read the data from Cloud Storage. Implement a deep neural network in TensorFlow.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
PhilipKoku
Highly Voted 11 months ago
Selected Answer: C
C) Data preparation in BigQuery. Ease of implementation with AutoML
upvoted 6 times
...
OpenKnowledge
Most Recent 1 month, 2 weeks ago
Selected Answer: C
AutoML enables quickly iterate on the model while minimizing cost. In addition to thst, for AutoML tabular and image classification models, Vertex Explainable AI automatically configures itself, so no specific additional configuration is required beyond enabling explanations when deploying the model.
upvoted 1 times
...
b7ad1d9
1 month, 2 weeks ago
Selected Answer: A
The answer is surely A. C does not make sense because the question says "You want to understand how your initial model is making predictions.". AutoML is a black box for explainability.
upvoted 1 times
...
fitri001
1 year ago
Selected Answer: C
Cost-Effectiveness: Leverages BigQuery for data storage and preprocessing, minimizing data movement costs. Utilizes Vertex AI's AutoML Tabular training, which is a pay-per-use service, reducing upfront costs compared to custom training environments. Rapid Iteration: AutoML Tabular automates feature engineering and model selection, allowing you to experiment with various configurations quickly. You can focus on refining feature engineering and interpreting model behavior based on AutoML's generated explanations.
upvoted 1 times
fitri001
1 year ago
why not B? Implementing a deep neural network from scratch requires significant development effort and might be overkill for an initial model. Interpretability of deep neural networks can also be challenging. While TensorFlow BigQueryClient allows data access, it requires writing custom training scripts, increasing development time.
upvoted 1 times
...
...
omermahgoub
1 year ago
Selected Answer: C
You work at an ecommerce startup. You need to create a customer churn prediction model. Your company’s recent sales records are stored in a BigQuery table. You want to understand how your initial model is making predictions. You also want to iterate on the model as quickly as possible while minimizing cost. How should you build your first model? A. Export the data to a Cloud Storage bucket. Load the data into a pandas DataFrame on Vertex AI Workbench and train a logistic regression model with scikit-learn. B. Create a tf.data.Dataset by using the TensorFlow BigQueryClient. Implement a deep neural network in TensorFlow. C. Prepare the data in BigQuery and associate the data with a Vertex AI dataset. Create an AutoMLTabularTrainingJob to tram a classification model. D. Export the data to a Cloud Storage bucket. Create a tf.data.Dataset to read the data from Cloud Storage. Implement a deep neural network in TensorFlow.
upvoted 1 times
...
Carlose2108
1 year, 2 months ago
Selected Answer: C
I went Option C
upvoted 1 times
...
pikachu007
1 year, 4 months ago
Selected Answer: C
Option A: While logistic regression is interpretable, manual training in Vertex AI Workbench adds time and complexity. Options B and D: Deep neural networks can be powerful but often lack interpretability, making it challenging to understand model decisions. They also require more hands-on model development and infrastructure management.
upvoted 4 times
...

Topic 1 Question 251

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 251 discussion

You are developing a training pipeline for a new XGBoost classification model based on tabular data. The data is stored in a BigQuery table. You need to complete the following steps:

1. Randomly split the data into training and evaluation datasets in a 65/35 ratio
2. Conduct feature engineering
3. Obtain metrics for the evaluation dataset
4. Compare models trained in different pipeline executions

How should you execute these steps?

  • A. 1. Using Vertex AI Pipelines, add a component to divide the data into training and evaluation sets, and add another component for feature engineering.
    2. Enable autologging of metrics in the training component.
    3. Compare pipeline runs in Vertex AI Experiments.
  • B. 1. Using Vertex AI Pipelines, add a component to divide the data into training and evaluation sets, and add another component for feature engineering.
    2. Enable autologging of metrics in the training component.
    3. Compare models using the artifacts’ lineage in Vertex ML Metadata.
  • C. 1. In BigQuery ML, use the CREATE MODEL statement with BOOSTED_TREE_CLASSIFIER as the model type and use BigQuery to handle the data splits.
    2. Use a SQL view to apply feature engineering and train the model using the data in that view.
    3. Compare the evaluation metrics of the models by using a SQL query with the ML.TRAINING_INFO statement.
  • D. 1. In BigQuery ML, use the CREATE MODEL statement with BOOSTED_TREE_CLASSIFIER as the model type and use BigQuery to handle the data splits.
    2. Use ML TRANSFORM to specify the feature engineering transformations and tram the model using the data in the table.
    3. Compare the evaluation metrics of the models by using a SQL query with the ML.TRAINING_INFO statement.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
pikachu007
Highly Voted 1 year, 10 months ago
Selected Answer: A
Option B: While Vertex ML Metadata provides artifact lineage, it's less comprehensive for model comparison than Experiments. Options C and D: BigQuery ML is powerful for in-database model training, but it has limitations in pipeline orchestration, complex feature engineering, and detailed model comparison features, making it less suitable for this scenario.
upvoted 9 times
...
wences
Most Recent 1 year, 1 month ago
Selected Answer: A
Can anyone give a good reason for the answers without using ChatGPT or Gemini?
upvoted 1 times
...
tardigradum
1 year, 2 months ago
Selected Answer: A
BQ ML falls a bit short when it comes to building pipelines that include feature engineering and experiment comparison (it's better to use Vertex Pipelines and do the comparisons using Vertex Experiments).
upvoted 2 times
...
fitri001
1 year, 6 months ago
Selected Answer: A
Flexibility and Control: Vertex AI Pipelines allow you to define a custom pipeline with separate components for data splitting, feature engineering, and XGBoost training using your preferred libraries (like BigQueryClient and xgboost). This provides more control and customization compared to BigQuery ML's limited model types and functionality. Feature Engineering and Data Splitting: Separate components enable clear separation of concerns and potentially parallel execution for efficiency. Autologging and Model Comparison: Vertex AI autologging simplifies capturing evaluation metrics during training. Vertex AI Experiments offer a centralized interface to compare metrics across different pipeline runs (potentially with varying hyperparameter configurations).
upvoted 1 times
fitri001
1 year, 6 months ago
why not C & D? C & D. BigQuery ML: While BigQuery ML offers some XGBoost functionality, it has limitations: Limited Model Types: BigQuery ML doesn't provide the full flexibility of using custom XGBoost libraries with advanced configurations. Less Control over Feature Engineering: Feature engineering using SQL views might be restrictive compared to a dedicated component in Vertex AI Pipelines. Limited Model Comparison: While ML.TRAINING_INFO provides some insights, Vertex AI Experiments offer a more comprehensive view for comparing models across pipeline runs.
upvoted 1 times
...
...
pinimichele01
1 year, 7 months ago
Selected Answer: A
see b1a8fae
upvoted 1 times
...
omermahgoub
1 year, 7 months ago
Selected Answer: A
A: Leverage Vertex AI Pipelines and Experiments
upvoted 1 times
...
guilhermebutzke
1 year, 8 months ago
Selected Answer: A
My Answer: A A: CORRECT: It involves proper data splitting into training and evaluation sets and conducting feature engineering within the pipeline, fulfilling steps 1 and 2. Enabling autologging of metrics ensures that you can track and compare the performance of different model executions, fulfilling step 3. B: Not Correct: Better use Vertex AI Experiments C and D: Not Correct: BigQuery ML lacks functionalities for comparing models across pipeline runs. You would need to rely on external tools or custom scripts to extract and compare evaluation metrics, making the process less streamlined.
upvoted 2 times
...
b1a8fae
1 year, 9 months ago
Selected Answer: A
Compare models in different pipeline executions -> go for Vertex AI experiments
upvoted 3 times
...

Topic 1 Question 252

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 252 discussion

You work for a company that sells corporate electronic products to thousands of businesses worldwide. Your company stores historical customer data in BigQuery. You need to build a model that predicts customer lifetime value over the next three years. You want to use the simplest approach to build the model and you want to have access to visualization tools. What should you do?

  • A. Create a Vertex AI Workbench notebook to perform exploratory data analysis. Use IPython magics to create a new BigQuery table with input features. Use the BigQuery console to run the CREATE MODEL statement. Validate the results by using the ML.EVALUATE and ML.PREDICT statements.
  • B. Run the CREATE MODEL statement from the BigQuery console to create an AutoML model. Validate the results by using the ML.EVALUATE and ML.PREDICT statements.
  • C. Create a Vertex AI Workbench notebook to perform exploratory data analysis and create input features. Save the features as a CSV file in Cloud Storage. Import the CSV file as a new BigQuery table. Use the BigQuery console to run the CREATE MODEL statement. Validate the results by using the ML.EVALUATE and ML.PREDICT statements.
  • D. Create a Vertex AI Workbench notebook to perform exploratory data analysis. Use IPython magics to create a new BigQuery table with input features, create the model, and validate the results by using the CREATE MODEL, ML.EVALUATE, and ML.PREDICT statements.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
rcapj
Highly Voted 1 year, 4 months ago
D Vertex AI Workbench notebook: Provides an environment for data analysis, model building, and visualization tools all in one place. IPython magics: Allows seamless interaction with BigQuery for data exploration and feature creation directly within the notebook. CREATE MODEL statement: Enables model creation within the notebook environment, simplifying the workflow. ML.EVALUATE and ML.PREDICT statements: Facilitate model validation directly within the notebook for assessing performance.
upvoted 6 times
...
pertoise
Highly Voted 1 year, 8 months ago
Option B because there's no mention of "flexibility". Easy access to viz tools with Looker
upvoted 6 times
...
OpenKnowledge
Most Recent 1 month, 2 weeks ago
Selected Answer: D
Although option B provides the simplest model (i.e., AutoML), it doesn't provide a visualization tool; because BigQuery ML itself does not have a built-in, comprehensive visualization tool for exploring models or their results. However, it integrates seamlessly with various visualization tools within the Google Cloud ecosystem and beyond. So, B is not an option for it. The other three options uses Vertex AI Workbench which will have access to comprehensive Visualization tools. Among the three options, D is the simplest approach which uses iPython magics (or simply, magics) for interactive shell
upvoted 2 times
...
Antmal
3 months ago
Selected Answer: D
I think D is the best and simplest approach. Here’s why: 1. Centralised Workflow: Everything from data exploration to model validation happens within a single Vertex AI notebook. This is clean, organised, and easily reproducible. 2. Powerful Visualisations: Notebooks allow you to use powerful Python libraries like Matplotlib, Seaborn, and Plotly for rich, interactive EDA. 3. Seamless BigQuery Integration: Using IPython magics (e.g., %%bigquery) allows you to run SQL queries directly on your BigQuery data from within the notebook, making it easy to create feature tables, train models (CREATE MODEL), and evaluate them (ML.EVALUATE) without ever leaving the notebook environment.
upvoted 2 times
...
nnn245bbb
6 months ago
Selected Answer: D
Based on ChatGPT answer
upvoted 1 times
...
NithishReddyNY
7 months, 1 week ago
Selected Answer: D
Option D provides the best balance of simplicity and meeting all requirements. It allows for visualization and EDA in Vertex AI Workbench and uses the simplest modeling approach (BQML via SQL commands) executed directly from the notebook environment using IPython magics, creating a cohesive and straightforward workflow.
upvoted 1 times
...
Wuthuong1234
8 months, 1 week ago
Selected Answer: B
The correct answer is probably B. AutoML does the feature engineering for you, so it requires the least amount of effort. For all other options, you have to do the exploration and feature engineering yourself, which will take a lot of time. In a real world scenario you'd expect to do a bit of EDA to identify and deal with missing values and to check the data quality though... even if you go for the B option.
upvoted 1 times
...
Dirtie_Sinkie
1 year, 1 month ago
Selected Answer: D
Going for D
upvoted 2 times
...
andymetzen
1 year, 2 months ago
Option D is the answer given by an official Google trainer.
upvoted 2 times
...
tardigradum
1 year, 2 months ago
Simple training and integration with visualization tools = BQ
upvoted 1 times
...
LaxmanTiwari
1 year, 4 months ago
Selected Answer: B
As requested :" simplest approach", the option B is the best choice.
upvoted 2 times
...
omermahgoub
1 year, 7 months ago
B. Use Bigquery ML Features to create, evaluate and predict
upvoted 3 times
...
daidai75
1 year, 9 months ago
Selected Answer: B
As requested :" simplest approach", the option B is the best choice.
upvoted 2 times
...
b1a8fae
1 year, 9 months ago
Selected Answer: B
Forgot to vote.
upvoted 1 times
...
b1a8fae
1 year, 9 months ago
Simplest approach that allows visualization is option B.
upvoted 2 times
...
winston9
1 year, 9 months ago
Selected Answer: B
all the other options create a new BQ table, I don't think it's needed.
upvoted 1 times
...
pikachu007
1 year, 10 months ago
Selected Answer: A
Option B: While AutoML simplifies model selection and training, it lacks the flexibility and visualization capabilities of Vertex AI Workbench. Option C: Manually saving features as CSV files and importing them back into BigQuery involves unnecessary data movement and complexity. Option D: Completing all steps within the notebook is possible but requires more coding and might not be as intuitive for those less familiar with BigQuery ML syntax.
upvoted 2 times
...

Topic 1 Question 253

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 253 discussion

You work for a delivery company. You need to design a system that stores and manages features such as parcels delivered and truck locations over time. The system must retrieve the features with low latency and feed those features into a model for online prediction. The data science team will retrieve historical data at a specific point in time for model training. You want to store the features with minimal effort. What should you do?

  • A. Store features in Bigtable as key/value data.
  • B. Store features in Vertex AI Feature Store.
  • C. Store features as a Vertex AI dataset, and use those features to train the models hosted in Vertex AI endpoints.
  • D. Store features in BigQuery timestamp partitioned tables, and use the BigQuery Storage Read API to serve the features.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
guilhermebutzke
Highly Voted 1 year, 2 months ago
Selected Answer: B
My Answer: B Vertex AI Feature Store because of these: “must retrieve the features with low latency” ,“retrieve historical data at a specific point in time”, and “ store the features with minimal effort”
upvoted 5 times
...
Rafa1312
Most Recent 3 weeks, 5 days ago
Selected Answer: A
I will go with A. B - Does not say anything about visualization. BQ Cannot be used for visualization C - Complex, Converting to CSV, storage etc D - Very close to A, Everything is being done in notebook, They asked an easy way. BQ Console is perfect for this. I will go with A
upvoted 1 times
...
OpenKnowledge
1 month, 2 weeks ago
Selected Answer: B
For low-latency "online serving", Vertex AI Feature Store offers different online serving options, including Optimized online serving and Bigtable online serving. These options are designed to handle high-throughput and low-latency feature retrieval for real-time predictions. Optimized online serving can provide lower latencies and supports embeddings management, while Bigtable online serving is useful for serving large data volumes (terabytes). For "offline or batch serving", Vertex AI Feature Store leverages BigQuery for offline storage of feature values. BigQuery is a highly scalable data warehouse capable of storing and querying petabytes of data, making it suitable for large-scale batch serving for model training or offline predictions.
upvoted 1 times
...
guilhermebutzke
1 year, 2 months ago
My Answer: B Vertex AI Feature Store because of these: “must retrieve the features with low latency” ,“retrieve historical data at a specific point in time”, and “ store the features with minimal effort”
upvoted 1 times
...
CHARLIE2108
1 year, 3 months ago
Selected Answer: B
I agree with dadai75
upvoted 1 times
...
daidai75
1 year, 3 months ago
Selected Answer: B
As required: "minimal effort" and "load latency", the Option B is the best choice.
upvoted 1 times
...
b1a8fae
1 year, 3 months ago
Selected Answer: B
Vertex AI Feature Store is optimized for ultra-low latency serving
upvoted 1 times
...
winston9
1 year, 4 months ago
Selected Answer: B
Feature store allows point in time retrieval
upvoted 1 times
...
winston9
1 year, 4 months ago
This is B
upvoted 1 times
...

Topic 1 Question 254

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 254 discussion

You are working on a prototype of a text classification model in a managed Vertex AI Workbench notebook. You want to quickly experiment with tokenizing text by using a Natural Language Toolkit (NLTK) library. How should you add the library to your Jupyter kernel?

  • A. Install the NLTK library from a terminal by using the pip install nltk command.
  • B. Write a custom Dataflow job that uses NLTK to tokenize your text and saves the output to Cloud Storage.
  • C. Create a new Vertex AI Workbench notebook with a custom image that includes the NLTK library.
  • D. Install the NLTK library from a Jupyter cell by using the !pip install nltk --user command.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
pikachu007
Highly Voted 1 year, 10 months ago
Selected Answer: D
Direct Installation: It installs the library directly within the notebook environment, making it immediately available for use. Simplicity: It requires a single command in a Jupyter cell, eliminating the need for external tools or configuration. User-Specific Installation: The --user flag ensures the library is installed in your user space, avoiding conflicts with system-wide packages.
upvoted 8 times
...
forport
Most Recent 1 year, 3 months ago
Selected Answer: D
Right command : !pip install nltk --user
upvoted 1 times
...
fitri001
1 year, 6 months ago
Selected Answer: D
Efficiency: It allows installation directly within your notebook cell, minimizing setup time compared to creating a custom image or using an external terminal. User-Level Installation: Using --user ensures the library is installed within your user environment, avoiding conflicts with system-wide installations or impacting other users.
upvoted 1 times
fitri001
1 year, 6 months ago
A. Terminal Installation: While possible if allowed, it requires switching contexts outside the notebook and might not be permitted in managed environments. B. Dataflow Job: A Dataflow job is an overkill for simple library usage within a notebook. It's designed for large-scale data processing pipelines. C. Custom Image: Creating a custom image with NLTK requires additional development effort and can be time-consuming for quick experimentation.
upvoted 1 times
...
...
tavva_prudhvi
1 year, 9 months ago
Selected Answer: D
This command installs the NLTK library directly from within your Jupyter notebook, allowing you to quickly proceed with your text tokenization experiments without needing to manage Docker images or set up external data processing jobs. The `--user` flag ensures that the library is installed in the user's space, avoiding potential conflicts with system-wide packages.
upvoted 3 times
...

Topic 1 Question 255

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 255 discussion

You have recently used TensorFlow to train a classification model on tabular data. You have created a Dataflow pipeline that can transform several terabytes of data into training or prediction datasets consisting of TFRecords. You now need to productionize the model, and you want the predictions to be automatically uploaded to a BigQuery table on a weekly schedule. What should you do?

  • A. Import the model into Vertex AI and deploy it to a Vertex AI endpoint. On Vertex AI Pipelines, create a pipeline that uses the DataflowPythonJobOp and the ModelBacthPredictOp components.
  • B. Import the model into Vertex AI and deploy it to a Vertex AI endpoint. Create a Dataflow pipeline that reuses the data processing logic sends requests to the endpoint, and then uploads predictions to a BigQuery table.
  • C. Import the model into Vertex AI. On Vertex AI Pipelines, create a pipeline that uses the
    DataflowPvthonJobOp and the ModelBatchPredictOp components.
  • D. Import the model into BigQuery. Implement the data processing logic in a SQL query. On Vertex AI Pipelines create a pipeline that uses the BigquervQueryJobOp and the BigqueryPredictModelJobOp components.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
BlehMaks
Highly Voted 1 year, 9 months ago
Selected Answer: C
The DataflowPythonJobOp operator lets you create a Vertex AI Pipelines component that prepares data by submitting a Python-based Apache Beam job to Dataflow for execution. https://cloud.google.com/vertex-ai/docs/pipelines/dataflow-component#dataflowpythonjobop Using we can specify an output location for Vertex AI to store predictions results https://cloud.google.com/vertex-ai/docs/pipelines/batchprediction-component A - is incorrect since we dont need an endpoint for batch predictions B - creating a new Dataflow pipeline is redundant
upvoted 10 times
...
OpenKnowledge
Most Recent 1 month, 2 weeks ago
Selected Answer: C
ModelBatchPredictOp can upload predictions to a BigQuery table. It is a component within a Vertex AI pipeline that executes a batch prediction job, and you can specify a BigQuery table as the destination for the output. To direct the batch prediction output to BigQuery, you specify the destination in the parameters for the ModelBatchPredictOp component.
upvoted 2 times
...
b7ad1d9
1 month, 2 weeks ago
Selected Answer: C
Endpoint = more for online predictions. For batch predictions, don't need an endpoint.
upvoted 1 times
...
Begum
5 months, 3 weeks ago
Selected Answer: B
Predictions are required to be uploaded to the BQ.
upvoted 2 times
...
lunalongo
11 months, 3 weeks ago
Selected Answer: C
C is the best option because it uses: 1) Vertex AI Pipelines for orchestrating the flow (managed and scalable). 2) DataflowPythonJobOp for prep and ModelBatchPredictOp for batch predictions on Vertex AI. *A deploys the model to a Vertex AI endpoint, inefficient for batch jobs! *B uses a single Dataflow pipeline, which needs custom Vertex AI and BQ integration. *D uses BigQuery, a datawarehouse, for model deployment and prediction.
upvoted 1 times
...
AK2020
1 year, 3 months ago
Selected Answer: B
Uploading predictions directly to BigQuery from the Dataflow pipeline integrates seamlessly with your data storage.
upvoted 2 times
...
AzureDP900
1 year, 4 months ago
B is right because 1)You've already trained a classification model using TensorFlow, so you need to productionize it by deploying it to a Vertex AI endpoint. 2)To automate the prediction process on a weekly schedule, you can create a Dataflow pipeline that reuses your existing data processing logic. This pipeline will send requests to the deployed model for inference and then upload the predicted results to BigQuery.
upvoted 2 times
...
Prakzz
1 year, 4 months ago
Selected Answer: B
Only option B talks about loading the data to BigQuery
upvoted 2 times
...
rcapj
1 year, 4 months ago
B Vertex AI Deployment: Vertex AI provides a managed environment for deploying machine learning models. It simplifies the process and ensures scalability. Dataflow Pipeline Reuse: Reusing the existing Dataflow pipeline for data processing leverages your existing code and avoids redundant logic. Model Endpoint Predictions: Sending requests to the deployed model endpoint allows for efficient prediction generation. BigQuery Upload: Uploading predictions directly to BigQuery from the Dataflow pipeline integrates seamlessly with your data storage.
upvoted 3 times
...
gscharly
1 year, 6 months ago
Selected Answer: C
No need to deploy to endpoint as we need batch predictions. ModelBatchPredictOp can upload data to BQ. Dataflow pipeline logic can be implemented in DataflowPythonJobOp
upvoted 4 times
...
fitri001
1 year, 6 months ago
Selected Answer: B
TFRecords is a specific file format designed by TensorFlow for storing data in a way that's efficient for the machine learning framework. Here are some key points about TFRecords:
upvoted 1 times
...
fitri001
1 year, 6 months ago
Selected Answer: B
Option A: Vertex AI Pipelines' ModelBatchPredictOp is designed for batch prediction within pipelines, not for serving models through an endpoint. Option C: Importing the model directly into BigQuery is not feasible for TensorFlow models. Option D: Vertex AI Pipelines' BigqueryPredictModelJobOp assumes the model is already trained and hosted in BigQuery ML, which isn't the case here.
upvoted 2 times
pinimichele01
1 year, 6 months ago
Importing the model directly into BigQuery is not feasible for TensorFlow models. -> not true
upvoted 3 times
...
...
pinimichele01
1 year, 7 months ago
Selected Answer: C
ModelBatchPredictOp -> upload automatically on BQ No need for endpoint --> C
upvoted 2 times
...
pinimichele01
1 year, 7 months ago
Selected Answer: C
agree with BlehMaks
upvoted 1 times
...
pertoise
1 year, 8 months ago
Answer is C. No need for an endpoint here : Simply specify the BigQuery table URI in the ModelBatchPredictOp parameter and you're done automatically uploading to BigQuery
upvoted 3 times
...
guilhermebutzke
1 year, 8 months ago
Selected Answer: B
My Answer: B The most complete answer, and reuse a created pipeline. Don’t make sense to use DataflowPythonJobOp when you have already created a dataflow pipeline that does the same.
upvoted 3 times
...
tavva_prudhvi
1 year, 9 months ago
Selected Answer: B
Not A, C as they does not explicitly mention how the predictions will be uploaded to BigQuery.
upvoted 1 times
...

Topic 1 Question 256

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 256 discussion

You work for an online grocery store. You recently developed a custom ML model that recommends a recipe when a user arrives at the website. You chose the machine type on the Vertex AI endpoint to optimize costs by using the queries per second (QPS) that the model can serve, and you deployed it on a single machine with 8 vCPUs and no accelerators.

A holiday season is approaching and you anticipate four times more traffic during this time than the typical daily traffic. You need to ensure that the model can scale efficiently to the increased demand. What should you do?

  • A. 1. Maintain the same machine type on the endpoint.
    2. Set up a monitoring job and an alert for CPU usage.
    3. If you receive an alert, add a compute node to the endpoint.
  • B. 1. Change the machine type on the endpoint to have 32 vCPUs.
    2. Set up a monitoring job and an alert for CPU usage.
    3. If you receive an alert, scale the vCPUs further as needed.
  • C. 1. Maintain the same machine type on the endpoint Configure the endpoint to enable autoscaling based on vCPU usage.
    2. Set up a monitoring job and an alert for CPU usage.
    3. If you receive an alert, investigate the cause.
  • D. 1. Change the machine type on the endpoint to have a GPU. Configure the endpoint to enable autoscaling based on the GPU usage.
    2. Set up a monitoring job and an alert for GPU usage.
    3. If you receive an alert, investigate the cause.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
fitri001
Highly Voted 1 year, 6 months ago
Selected Answer: C
Option A: Manually adding compute nodes after an alert might lead to delays and potential outages during peak traffic. Option B: Upgrading to 32 vCPUs upfront might be an overkill if the current machine type with 8 vCPUs can handle the typical daily traffic. Vertical scaling (more vCPUs) might be suitable only if the model can benefit from additional CPU power. Option D: Using a GPU is unlikely to benefit a recipe recommendation model, which likely doesn't involve intensive graphical processing. Additionally, monitoring GPU usage wouldn't be relevant.
upvoted 8 times
...
lunalongo
Most Recent 11 months, 3 weeks ago
Selected Answer: C
C) Option C is the best because: 1) It leverages the built-in autoscaling capabilities of Vertex AI. 2) It's the most efficient/cost-effective solution for fluctuating traffic. 2) Manually scaling (options A and B) is reactive and inefficient 3) A GPU is unnecessary, there is no intensive graphical processing
upvoted 2 times
...
AzureDP900
1 year, 4 months ago
C is right because 1)Since you've already optimized your model's deployment on a single machine with 8 vCPUs, it makes sense to maintain the same machine type to avoid any potential performance issues. 2)Enabling autoscaling based on vCPU usage will allow your endpoint to automatically add more machines as needed to handle the increased traffic during the holiday season. This approach is more efficient and cost-effective than scaling up individual machines or adding new machines manually. 3)Monitoring CPU usage with a job and alerting when thresholds are exceeded allows you to detect potential issues before they impact performance.
upvoted 2 times
...
omermahgoub
1 year, 7 months ago
Selected Answer: C
C: Use Autoscaling Based on vCPU Usage
upvoted 1 times
...
emsherff
1 year, 7 months ago
Selected Answer: C
Autoscaling based on vCPU usage aligns well with the workload.
upvoted 1 times
...
emsherff
1 year, 7 months ago
Option A is manual intervention Option B is overprovisioning preemptively, which is an overkill ( autoscaling should be preferred) Option D - Unless the recipe recommendation model uses GPU-accelerated computations (e.g., some deep learning models), adding a GPU won't be beneficial and will increase costs. I would go with C - Autoscaling based on vCPU usage which aligns well with the workload.
upvoted 2 times
...
daidai75
1 year, 9 months ago
Selected Answer: C
Option B can only support exact 4x times traffic, but the requirement is four times "more", so B is not the best at least for me.
upvoted 1 times
...
b1a8fae
1 year, 9 months ago
Selected Answer: C
I would go for C as it enables autoscaling when exceeding a determined CPU usage threshold.
upvoted 1 times
...
pikachu007
1 year, 10 months ago
Selected Answer: C
Cost Optimization: It starts with the current machine type, avoiding unnecessary upfront costs, and scales only when needed. Autoscaling: It automatically adjusts compute resources based on vCPU usage, ensuring the endpoint can handle traffic spikes without manual intervention. Monitoring and Alerting: It provides visibility into resource usage and triggers alerts for potential issues, enabling proactive actions. Investigation: It encourages investigation of alerts to identify any underlying problems beyond expected traffic growth, ensuring overall system health.
upvoted 1 times
...
kalle_balle
1 year, 10 months ago
Selected Answer: B
Voting for B as it's the only option to autoscale even though the cost will go up. All other options include manual intervention.
upvoted 1 times
b1a8fae
1 year, 9 months ago
Wouldn't scaling up the vCPUs after receiving the alert also be manual? It comes across as such to me at least.
upvoted 1 times
...
...

Topic 1 Question 257

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 257 discussion

You recently trained an XGBoost model on tabular data. You plan to expose the model for internal use as an HTTP microservice. After deployment, you expect a small number of incoming requests. You want to productionize the model with the least amount of effort and latency. What should you do?

  • A. Deploy the model to BigQuery ML by using CREATE MODEL with the BOOSTED_TREE_REGRESSOR statement, and invoke the BigQuery API from the microservice.
  • B. Build a Flask-based app. Package the app in a custom container on Vertex AI, and deploy it to Vertex AI Endpoints.
  • C. Build a Flask-based app. Package the app in a Docker image, and deploy it to Google Kubernetes Engine in Autopilot mode.
  • D. Use a prebuilt XGBoost Vertex container to create a model, and deploy it to Vertex AI Endpoints.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
pikachu007
Highly Voted 1 year, 10 months ago
Selected Answer: D
Prebuilt Container: It eliminates the need to build and manage a custom container, reducing development time and complexity. Vertex AI Endpoints: It provides a managed serving infrastructure with low latency and high availability, optimizing performance for predictions. Minimal Effort: It involves simple steps of creating a Vertex model and deploying it to an endpoint, streamlining the process.
upvoted 8 times
...
b1a8fae
Highly Voted 1 year, 9 months ago
Selected Answer: D
Bit lost here. I would discard buiding a Flask app since that is the opposite of "minimum effort". Between A and D, I guess a prebuilt container (D) involves less effort, but I am not 100% confident.
upvoted 5 times
...
AzureDP900
Most Recent 1 year, 4 months ago
Option D is correct : Using a prebuilt XGBoost Vertex container (Option D) is the most straightforward approach. This container is specifically designed for running XGBoost models in production environments and can be easily deployed to Vertex AI Endpoints. This will allow you to expose your model as an HTTP microservice with minimal additional work.
upvoted 1 times
...
fitri001
1 year, 6 months ago
Selected Answer: D
Package the Model: Use a library like xgboost-server to create a minimal server for your XGBoost model. This package helps convert your model into a format suitable for serving predictions through an HTTP endpoint. Deploy to Cloud Functions: Deploy the packaged model server as a Cloud Function on Google Cloud Platform (GCP). Cloud Functions are serverless, lightweight execution environments ideal for event-driven applications like microservices. Configure Trigger: Set up an HTTP trigger for your Cloud Function, allowing it to be invoked through HTTP requests.
upvoted 2 times
...

Topic 1 Question 258

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 258 discussion

You work for an international manufacturing organization that ships scientific products all over the world. Instruction manuals for these products need to be translated to 15 different languages. Your organization’s leadership team wants to start using machine learning to reduce the cost of manual human translations and increase translation speed. You need to implement a scalable solution that maximizes accuracy and minimizes operational overhead. You also want to include a process to evaluate and fix incorrect translations. What should you do?

  • A. Create a workflow using Cloud Function triggers. Configure a Cloud Function that is triggered when documents are uploaded to an input Cloud Storage bucket. Configure another Cloud Function that translates the documents using the Cloud Translation API, and saves the translations to an output Cloud Storage bucket. Use human reviewers to evaluate the incorrect translations.
  • B. Create a Vertex AI pipeline that processes the documents launches, an AutoML Translation training job, evaluates the translations and deploys the model to a Vertex AI endpoint with autoscaling and model monitoring. When there is a predetermined skew between training and live data, re-trigger the pipeline with the latest data.
  • C. Use AutoML Translation to train a model. Configure a Translation Hub project, and use the trained model to translate the documents. Use human reviewers to evaluate the incorrect translations.
  • D. Use Vertex AI custom training jobs to fine-tune a state-of-the-art open source pretrained model with your data. Deploy the model to a Vertex AI endpoint with autoscaling and model monitoring. When there is a predetermined skew between the training and live data, configure a trigger to run another training job with the latest data.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
daidai75
Highly Voted 1 year, 9 months ago
Selected Answer: C
The answer is C, to use Translation Hub 1.Accuracy maximization: AutoML Translation uses machine learning to train a translation model on your specific data, which can lead to higher accuracy compared to generic translation models. 2.Minimal operational overhead: AutoML Translation handles the training and deployment of the translation model, reducing the need for manual intervention. 3.Evaluation and correction: The solution includes human reviewers to evaluate and correct any incorrect translations, ensuring high quality.
upvoted 7 times
...
Wuthuong1234
Most Recent 8 months, 1 week ago
Selected Answer: C
My first instinct was to go for A, but after reading through the question in detail, I think the right answer is C. It is mentioned that we are dealing with instructions for scientific products. The implication is that the instructions will therefore use very complicated and niche language, which the Natural Language API will most likely struggle to understand properly. AutoML Translations is meant to be for these types of tasks where the language is very domain-specific: https://cloud.google.com/translate/docs/advanced/automl-beginner
upvoted 1 times
...
juliorevk
11 months, 2 weeks ago
Selected Answer: A
A - It uses the most managed services which reduces operational overhead. Translation API has good translations that's improving as Google improves its translation services.
upvoted 2 times
...
DaleR
11 months, 2 weeks ago
Selected Answer: B
You want to minimize operational overhead.
upvoted 1 times
...
AzureDP900
1 year, 4 months ago
Using AutoML Translation (Option C) allows you to train a model on your data, which can be used for translation. You can then configure a Translation Hub project to manage the translation process and use human reviewers to evaluate any incorrect translations. It is scalable solution that maximizes accuracy and minimizes operational overhead.
upvoted 1 times
...
gscharly
1 year, 6 months ago
Selected Answer: C
if we assume there is training data available (source-target language pairs) then I would go with C.
upvoted 1 times
...
fitri001
1 year, 6 months ago
Selected Answer: C
Option A: Cloud Functions are suitable for simple tasks. This approach wouldn't leverage machine learning for improved translations and lacks features like model evaluation and retraining. Option B: Vertex AI pipelines with AutoML Translation training can be powerful, but it might be overkill for this scenario. Additionally, retraining based on a predetermined data skew might not be necessary if human review is effective at catching and correcting errors. Option D: While fine-tuning a pre-trained model with Vertex AI custom training offers flexibility, it requires more expertise and ongoing maintenance compared to the simpler approach of using AutoML Translation.
upvoted 3 times
...
b2aaace
1 year, 7 months ago
Answer A It is the only option that makes sense all over. I would go for C if the first sentence was not there "Use AutoML Translation". you can't use autoML because there is no training data.
upvoted 4 times
...
omermahgoub
1 year, 7 months ago
Selected Answer: C
C: Use AutoML Translation with Translation Hub. Here's why: 1. Scalability: - AutoML Translation: This simplifies model training without extensive manual configuration. - Translation Hub: Centrally stores and manages your translation models, facilitating deployment and reuse across various applications, promoting scalability for your 15 target languages. 2. Accuracy and Evaluation: - AutoML Translation: while pre-trained models might not be perfect, AutoML Translation lets you fine-tune the model with your specific scientific domain data (instruction manuals) to improve accuracy. - Human Review and Iteration: This allows for evaluation and correction of any inaccurate translations, improving overall quality. This is crucial for technical documents like instruction manuals.
upvoted 2 times
omermahgoub
1 year, 7 months ago
Why not B: Retraining the model upon data skew detection can become cumbersome and impact translation speed. Translation Hub offers a more streamlined approach for managing model updates.
upvoted 1 times
...
...
emsherff
1 year, 7 months ago
Selected Answer: C
Translation Hub can manage translation workloads at scale and also integrate human feedback where required.
upvoted 1 times
...
edoo
1 year, 8 months ago
So what is the deal? pikachu007 authors the question, adds C as suggested answer and then vote for B?
upvoted 2 times
...
Sunny_M
1 year, 8 months ago
Selected Answer: B
Agree with pikachu007, I think there is no point in using ML once the manual(human) mode is added.
upvoted 1 times
...
b1a8fae
1 year, 9 months ago
Selected Answer: C
Translation Hub is a service that allows you to manage and automate your translation workflows on Google Cloud. You can use Translation Hub to upload the documents to a Cloud Storage bucket, select the source and target languages, and apply the trained model to translate the documents. You can use human reviewers to improve the quality and accuracy of the translations, and provide feedback to the ML model.
upvoted 3 times
...
pikachu007
1 year, 10 months ago
Selected Answer: B
Option A: While Cloud Functions provide automation, the Cloud Translation API uses generic models that might not be as accurate for domain-specific content, potentially leading to more human corrections. Option C: Translation Hub offers collaboration features but lacks automated model training and pipeline orchestration, requiring more manual effort. Option D: Vertex AI custom training jobs provide flexibility but require more expertise and effort compared to AutoML Translation, and the pre-trained model might not be as well-suited for the specific domain.
upvoted 2 times
...

Topic 1 Question 259

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 259 discussion

You have developed an application that uses a chain of multiple scikit-learn models to predict the optimal price for your company’s products. The workflow logic is shown in the diagram. Members of your team use the individual models in other solution workflows. You want to deploy this workflow while ensuring version control for each individual model and the overall workflow. Your application needs to be able to scale down to zero. You want to minimize the compute resource utilization and the manual effort required to manage this solution. What should you do?

  • A. Expose each individual model as an endpoint in Vertex AI Endpoints. Create a custom container endpoint to orchestrate the workflow.
  • B. Create a custom container endpoint for the workflow that loads each model’s individual files Track the versions of each individual model in BigQuery.
  • C. Expose each individual model as an endpoint in Vertex AI Endpoints. Use Cloud Run to orchestrate the workflow.
  • D. Load each model’s individual files into Cloud Run. Use Cloud Run to orchestrate the workflow. Track the versions of each individual model in BigQuery.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
AzureDP900
1 year, 4 months ago
Option C is right because: 1)Exposing individual models as Vertex AI Endpoints (Option C) allows for version tracking, which is essential for maintaining consistency across different workflows. 2)Using Cloud Run to orchestrate the workflow (Option C) enables you to scale down to zero and minimize compute resource utilization. 3)You want to deploy your application while ensuring version control for each individual model and the overall workflow.
upvoted 3 times
...
gscharly
1 year, 6 months ago
Selected Answer: C
B,D not correct since BQ is not the best approach. A would require more manual work
upvoted 1 times
...
guilhermebutzke
1 year, 8 months ago
My Answer: C B and D: Not Correct: Big query is not the best approach to trach versions of model. A and C: Looking for “ensuring version control for each individual mode” (endpoints), and “be able to scale down to zero”, “minimize the compute resource utilization and the manual effort required to manage this solution”, I think to use Cloud Run could be the best option for those cases. https://www.youtube.com/watch?v=nhwYc4StHIc&ab_channel=GoogleCloudTech
upvoted 4 times
...
pikachu007
1 year, 10 months ago
Selected Answer: C
Option A: A custom container endpoint for orchestration adds complexity and management overhead. Option B: Loading model files directly into a custom container endpoint can lead to versioning challenges and potential conflicts if models are shared across workflows. Option D: Using BigQuery for model versioning is not its primary function and might introduce complexities in model loading and management.
upvoted 4 times
...

Topic 1 Question 260

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 260 discussion

You are developing a model to predict whether a failure will occur in a critical machine part. You have a dataset consisting of a multivariate time series and labels indicating whether the machine part failed. You recently started experimenting with a few different preprocessing and modeling approaches in a Vertex AI Workbench notebook. You want to log data and track artifacts from each run. How should you set up your experiments?

  • A. 1. Use the Vertex AI SDK to create an experiment and set up Vertex ML Metadata.
    2. Use the log_time_series_metrics function to track the preprocessed data, and use the log_merrics function to log loss values.
  • B. 1. Use the Vertex AI SDK to create an experiment and set up Vertex ML Metadata.
    2. Use the log_time_series_metrics function to track the preprocessed data, and use the log_metrics function to log loss values.
  • C. 1. Create a Vertex AI TensorBoard instance and use the Vertex AI SDK to create an experiment and associate the TensorBoard instance.
    2. Use the assign_input_artifact method to track the preprocessed data and use the log_time_series_metrics function to log loss values.
  • D. 1. Create a Vertex AI TensorBoard instance, and use the Vertex AI SDK to create an experiment and associate the TensorBoard instance.
    2. Use the log_time_series_metrics function to track the preprocessed data, and use the log_metrics function to log loss values.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
gscharly
Highly Voted 1 year, 6 months ago
Selected Answer: C
log_time_series_metrics requires setting Tensorboard: https://cloud.google.com/vertex-ai/docs/experiments/log-data assign_input_artifacts can be used to track input data: https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/experiments/get_started_with_vertex_experiments.ipynb
upvoted 8 times
...
fitri001
Highly Voted 1 year, 6 months ago
Selected Answer: B
Vertex AI Experiment and ML Metadata: This is the foundation for tracking experiments and artifacts within Vertex AI.expand_more Creating an experiment allows you to group related runs and log data associated with those runs. ML Metadata helps manage the lineage of data and models used in your experiments.expand_more Logging Data: log_time_series_metrics: This function is specifically designed for tracking time-series data, making it suitable for logging the preprocessed multivariate time series data in your experiment. log_metrics: This function is appropriate for logging loss values during model training. It can handle numerical values like loss efficiently. By combining these techniques, you can effectively track both the preprocessed data (time series) and the training performance metrics (loss values) within your Vertex AI Experiment.
upvoted 5 times
fitri001
1 year, 6 months ago
Option A: It lacks the functionality to log preprocessed data (no log_time_series_metrics). Option C and D: While TensorBoard can be used for visualization, it's not directly related to logging data within Vertex AI Experiments. pen_spark exclamation Additionally, assign_input_artifact isn't the correct method for logging time series data
upvoted 3 times
...
...
dija123
Most Recent 3 weeks, 5 days ago
Selected Answer: B
Agree with B, No need for tensorBoard.
upvoted 1 times
...
Dirtie_Sinkie
1 year, 1 month ago
Selected Answer: C
C sounds more correct
upvoted 1 times
...
tungdeptraiqua
1 year, 3 months ago
Selected Answer: B
A and B are the same
upvoted 3 times
rajshiv
11 months, 2 weeks ago
In A, there's a typo in log_merrics (should be log_metrics). Therefore A is incorrect.
upvoted 1 times
...
...
omermahgoub
1 year, 7 months ago
Selected Answer: B
Why B? 1. Experiment Creation: Vertex AI SDK establishes a context for grouping your training runs and facilitates experiment management. 2. By setting up Vertex ML Metadata (only can be done when creating an experiment with the Vertex AI SDK), you enable tracking of artifacts and metrics associated with each experiment run. 3. log_time_series_metrics function is well-suited for tracking the preprocessed multivariate time series data associated with each experiment run. This allows you to analyze how preprocessing impacts model performance.
upvoted 3 times
...
Yan_X
1 year, 7 months ago
Selected Answer: B
B The assign_input_artifacts method is used to associate input artifacts with an experiment, that is not used for log time series and labels. A and B is just with a minor typo (metric vs merric), so select B.
upvoted 2 times
...
guilhermebutzke
1 year, 8 months ago
Selected Answer: C
My Answer: C assign_input_artifact method is a method to Vertex Ai Experiment to track the preprocessed data while log_time_series_metrics is a function of Vertex AI TensorBoard to log metrics along time. look: https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/experiments/build_model_experimentation_lineage_with_prebuild_code.ipynb https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/experiments/comparing_local_trained_models.ipynb
upvoted 2 times
...
b1a8fae
1 year, 9 months ago
Selected Answer: C
C. Tensorboard for experimentation and comparison of different model runs. assign_input_artifacts to track preprocessed data, since it links artifacts as inputs to the execution. https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.Execution#google_cloud_aiplatform_Execution_assign_input_artifacts Using log_time_series_metrics would make sense if what we were doing is logging a metric, which we aren't when we track the preprocessed data not yet ran by the model.
upvoted 2 times
...

Topic 1 Question 261

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 261 discussion

You are developing a recommendation engine for an online clothing store. The historical customer transaction data is stored in BigQuery and Cloud Storage. You need to perform exploratory data analysis (EDA), preprocessing and model training. You plan to rerun these EDA, preprocessing, and training steps as you experiment with different types of algorithms. You want to minimize the cost and development effort of running these steps as you experiment. How should you configure the environment?

  • A. Create a Vertex AI Workbench user-managed notebook using the default VM instance, and use the %%bigquerv magic commands in Jupyter to query the tables.
  • B. Create a Vertex AI Workbench managed notebook to browse and query the tables directly from the JupyterLab interface.
  • C. Create a Vertex AI Workbench user-managed notebook on a Dataproc Hub, and use the %%bigquery magic commands in Jupyter to query the tables.
  • D. Create a Vertex AI Workbench managed notebook on a Dataproc cluster, and use the spark-bigquery-connector to access the tables.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
b1a8fae
Highly Voted 1 year, 9 months ago
Selected Answer: B
"Managed notebooks are usually a good choice if you want to use a notebook for data exploration, analysis, modeling, or as part of an end-to-end data science workflow. Managed notebooks instances let you perform workflow-oriented tasks without leaving the JupyterLab interface. They also have many integrations and features for implementing your data science workflow." vs. "User-managed notebooks can be a good choice for users who require extensive customization or who need a lot of control over their environment." Seems more like the former -> B
upvoted 7 times
...
AzureDP900
Most Recent 1 year, 4 months ago
B is right because this option allows you to minimize cost and development effort by using a managed notebook in Vertex AI Workbench, which integrates well with BigQuery and Cloud Storage. You can browse and query your data directly within the JupyterLab interface without having to create a separate BigQuery client or use the bq command-line tool.
upvoted 3 times
...
pinimichele01
1 year, 6 months ago
Selected Answer: B
see b1a8fae
upvoted 1 times
...
gscharly
1 year, 6 months ago
Selected Answer: A
agree with guilhermebutzke. Also, this option is easier to reuse in multiple experiments
upvoted 1 times
...
guilhermebutzke
1 year, 8 months ago
Selected Answer: A
My Answer: A A: Default VM instance is the best to minimize the cost, and the command %%bigquery magic is the most easy way to get data from BQ. B: Not necessary JupyerLab interface to run code. The %%bigquerv magic commands is sufficient to get data and run easily queries. C: Dataproc Hub seems overkill and it is more expensive than a default VM instance. C: spark-bigquery-connector unnecessary to get tables in the notebook. better use %%bigquery.
upvoted 1 times
...
daidai75
1 year, 9 months ago
Selected Answer: B
https://cloud.google.com/bigquery/docs/visualize-jupyter
upvoted 1 times
...
shadz10
1 year, 9 months ago
Selected Answer: B
https://cloud.google.com/vertex-ai/docs/workbench/notebook-solution#:~:text=For%20users%20who%20have%20specific,user%2Dmanaged%20notebooks%20instance's%20VM.
upvoted 1 times
...
pikachu007
1 year, 10 months ago
Selected Answer: B
Option A: User-managed notebooks require VM instance management, adding cost and complexity. %%bigquery magic commands are still needed. Option C: Dataproc Hub adds unnecessary cost and complexity for simple BigQuery interactions. Option D: Spark-bigquery-connector adds complexity and overhead compared to the native BigQuery integration in managed notebooks.
upvoted 1 times
...

Topic 1 Question 262

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 262 discussion

You recently deployed a model to a Vertex AI endpoint and set up online serving in Vertex AI Feature Store. You have configured a daily batch ingestion job to update your featurestore. During the batch ingestion jobs, you discover that CPU utilization is high in your featurestore’s online serving nodes and that feature retrieval latency is high. You need to improve online serving performance during the daily batch ingestion. What should you do?

  • A. Schedule an increase in the number of online serving nodes in your featurestore prior to the batch ingestion jobs
  • B. Enable autoscaling of the online serving nodes in your featurestore
  • C. Enable autoscaling for the prediction nodes of your DeployedModel in the Vertex AI endpoint
  • D. Increase the worker_count in the ImportFeatureValues request of your batch ingestion job
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
dija123
1 month ago
Selected Answer: A
The daily batch ingestion time is already known. thus we can schedule the nodes increase.
upvoted 1 times
dija123
1 month ago
The Problem with Reactive Scaling (Autoscaling) autoscaling is reactive. It works like this: The batch job starts. CPU load on the existing nodes begins to spike. The monitoring system detects that the CPU has crossed the predefined threshold (e.g., 60%). The autoscaler initiates the process to add new nodes. The new nodes take time to provision and become ready to serve traffic (this is the "ramp-up time"). During that ramp-up period—which could be several minutes—the system is still under-provisioned, and the online serving performance will be poor. The problem of high latency will still occur at the beginning of every batch job.
upvoted 1 times
...
...
mouthwash
10 months, 2 weeks ago
Selected Answer: B
Answer is B, coz how do you predict there will be a problem before the batch ingestion job? Seems preemptive and may be unnecessary. B aligns more coz error has happened and now you enabling autoscaling, so that in the future it will autoscale.
upvoted 1 times
...
tardigradum
1 year, 2 months ago
Selected Answer: A
Agree with bobjr
upvoted 1 times
...
Prakzz
1 year, 4 months ago
Selected Answer: A
https://cloud.google.com/vertex-ai/docs/featurestore/managing-featurestores Specifically mentioned here that --> If CPU utilization is consistently high, consider increasing the number of online serving nodes for your featurestore.
upvoted 2 times
...
bobjr
1 year, 5 months ago
Selected Answer: A
Gemini + Perplexity ai + ChatGPT votes A Because : B. Enable Autoscaling: While autoscaling can be useful, it might not react quickly enough to sudden spikes in traffic during batch ingestion. Scheduling the increase ensures that the resources are available when needed.
upvoted 3 times
...
cruise93
1 year, 6 months ago
Selected Answer: D
This question is valid for the Legacy feature store. https://cloud.google.com/vertex-ai/docs/featurestore/ingesting-batch#import_job_performance
upvoted 1 times
pinimichele01
1 year, 6 months ago
"CPU utilization is high in your featurestore’s online serving nodes"
upvoted 1 times
...
...
daidai75
1 year, 9 months ago
Selected Answer: B
https://cloud.google.com/vertex-ai/docs/featurestore/managing-featurestores?&_gl=1*sswg5e*_ga*NDE2OTc3OTAzLjE3MDU4OTQ5OTE.*_ga_WH2QY8WWF5*MTcwNTkzNDM0NS40LjAuMTcwNTkzNDM0NS4wLjAuMA..&_ga=2.242492743.-416977903.1705894991#online_serving_nodes
upvoted 2 times
...
b1a8fae
1 year, 9 months ago
Selected Answer: B
Vertex AI Feature Store provides two options for online serving: Bigtable and optimized online serving. Both options support autoscaling, which means that the number of online serving nodes can automatically adjust to the traffic demand. By enabling autoscaling, you can improve the online serving performance and reduce the feature retrieval latency during the daily batch ingestion. Autoscaling also helps you optimize the cost and resource utilization of your featurestore.
upvoted 3 times
...
pikachu007
1 year, 10 months ago
Selected Answer: B
Option A: Manually scheduling node increases requires prior knowledge of batch ingestion times and might not be as responsive to unexpected workload spikes. Option C: Autoscaling prediction nodes in the Vertex AI endpoint might help with model prediction latency but doesn't directly address feature retrieval latency from the featurestore. Option D: Increasing worker_count in the batch ingestion job could speed up ingestion but might further strain online serving nodes, potentially worsening latency.
upvoted 2 times
iieva
1 year, 10 months ago
Hey Pikachu, did you pass the exam or are you preparing? I am as well preparing and I have noticed that in many question you chose the same answer I would chose, but which is not the indicated answer of my Udemy Course Exam Preparation. Thanks and Best
upvoted 1 times
pikachu007
1 year, 9 months ago
Yes passed
upvoted 5 times
...
...
Sunny_M
1 year, 9 months ago
Hi @pikachu007 May I ask when did you pass the exam? was it after they updated the new questions? I need to take the exam ASAP, just want to make sure the new questions are valid?
upvoted 1 times
...
...

Topic 1 Question 263

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 263 discussion

You are developing a custom TensorFlow classification model based on tabular data. Your raw data is stored in BigQuery. contains hundreds of millions of rows, and includes both categorical and numerical features. You need to use a MaxMin scaler on some numerical features, and apply a one-hot encoding to some categorical features such as SKU names. Your model will be trained over multiple epochs. You want to minimize the effort and cost of your solution. What should you do?

  • A. 1. Write a SQL query to create a separate lookup table to scale the numerical features.
    2. Deploy a TensorFlow-based model from Hugging Face to BigQuery to encode the text features.
    3. Feed the resulting BigQuery view into Vertex AI Training.
  • B. 1. Use BigQuery to scale the numerical features.
    2. Feed the features into Vertex AI Training.
    3. Allow TensorFlow to perform the one-hot text encoding.
  • C. 1. Use TFX components with Dataflow to encode the text features and scale the numerical features.
    2. Export results to Cloud Storage as TFRecords.
    3. Feed the data into Vertex AI Training.
  • D. 1. Write a SQL query to create a separate lookup table to scale the numerical features.
    2. Perform the one-hot text encoding in BigQuery.
    3. Feed the resulting BigQuery view into Vertex AI Training.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
b2aaace
Highly Voted 1 year, 6 months ago
Selected Answer: C
"Full-pass stateful transformations aren't suitable for implementation in BigQuery. If you use BigQuery for full-pass transformations, you need auxiliary tables to store quantities needed by stateful transformations, such as means and variances to scale numerical features. Further, implementation of full-pass transformations using SQL on BigQuery creates increased complexity in the SQL scripts, and creates intricate dependency between training and the scoring SQL scripts." https://www.tensorflow.org/tfx/guide/tft_bestpractices#where_to_do_preprocessing
upvoted 9 times
Ankit267
10 months, 2 weeks ago
only requirement is minmax scaling, not mean or variance. D, why need extra component like Cloud Storage, Vertex AI when it could be done easily in BQ where already the raw data is stored
upvoted 3 times
...
Prakzz
1 year, 4 months ago
Isn't Dataflow includes a lot of effort as the question asking to minimize the effort here?
upvoted 2 times
...
...
bobjr
Highly Voted 1 year, 5 months ago
Selected Answer: D
GPT says D, Gemini says B, Perplexity says C.... I say D : stay in one tool, BQ, which is cheap and natively scalable. B has a risk of out of memory error.
upvoted 6 times
...
dija123
Most Recent 1 month, 1 week ago
Selected Answer: C
Gemini 2.5 Pro is saying C
upvoted 1 times
dija123
1 month, 1 week ago
Preprocessing the data once and saving it as TFRecords is highly cost-effective. TFRecord is a binary format optimized for TensorFlow. Training from TFRecords in Cloud Storage is much faster and cheaper than re-scanning a massive BigQuery table for every training epoch.
upvoted 1 times
...
...
thescientist
10 months, 2 weeks ago
Selected Answer: B
With multiple epochs, the data is passed through the model multiple times. If you pre-encoded the categorical features (as in options C and D), you would be storing and repeatedly reading a much larger dataset (due to the one-hot encoding). This significantly increases storage costs and I/O overhead. By performing the one-hot encoding within TensorFlow during training (as in option B), the encoding happens on-the-fly for each batch of data during each epoch.
upvoted 2 times
...
Ankit267
10 months, 2 weeks ago
Selected Answer: D
B & D as top 2 choices, C is including Dataflow unnecessarily. D as "minimize the effort and cost of your solution", still some room for B but I selected D
upvoted 1 times
...
pipefaxaf
1 year ago
Selected Answer: D
Option D minimizes effort and cost by using BigQuery to handle both the scaling and one-hot encoding. BigQuery is efficient for these types of preprocessing tasks, especially when dealing with large datasets. By preparing the data in BigQuery, you avoid the need to export data to other services or use additional resources for preprocessing, such as Dataflow. This approach provides a streamlined workflow by creating a preprocessed view in BigQuery, which can then be directly fed into Vertex AI Training without extra transformation steps. This helps optimize cost and simplicity while handling large tabular data effectively.
upvoted 4 times
DaleR
11 months, 2 weeks ago
Agree with pipefaxal. Minimize effort is key here.
upvoted 1 times
...
...
YangG
1 year ago
Selected Answer: C
multiple epochs --> need to persist data after preprocessing
upvoted 4 times
...
wences
1 year, 1 month ago
Selected Answer: D
Option D since it says minimize effort and cost following that adding something rather than BQ will increase complexity.
upvoted 3 times
...
AzureDP900
1 year, 4 months ago
Option C uses TFX (TensorFlow Extended) components with Dataflow, which is a great way to perform complex data preprocessing tasks like one-hot encoding and scaling. This approach allows you to process your data in a scalable and efficient manner, using Cloud Storage as the output location. By exporting the results as TFRecords, you can easily feed this preprocessed data into Vertex AI Training for model development.
upvoted 2 times
...
dija123
1 year, 4 months ago
Selected Answer: C
agree with TFX components with Dataflow
upvoted 1 times
...
fitri001
1 year, 6 months ago
Selected Answer: B
BigQuery for Preprocessing: BigQuery is a serverless data warehouse optimized for large datasets.expand_more It can handle scaling numerical features using built-in functions like SCALE or QUANTILE_SCALE, reducing the need for complex custom logic or separate lookup tables. TensorFlow for One-Hot Encoding: TensorFlow excels at in-memory processing. One-hot encoding of categorical features, especially text features like SKU names, can be efficiently performed within your TensorFlow model during training. This avoids unnecessary data movement or transformations in BigQuery. Vertex AI Training: By feeding the preprocessed data (scaled numerical features) directly into Vertex AI Training, you leverage its managed infrastructure for training your custom TensorFlow model.
upvoted 2 times
fitri001
1 year, 6 months ago
Option A: Creates unnecessary complexity and data movement. BigQuery is better suited for scaling numerical features, and TensorFlow is efficient for one-hot encoding. Option C: TFX is a powerful framework for complex pipelines, but for a simpler scenario like this, it might be an overkill. Additionally, exporting data as TFRecords adds an extra step, potentially increasing cost and complexity. Option D: One-hot encoding in BigQuery might be cumbersome for textual features like SKU names. pen_spark exclamation It can be computationally expensive and result in data explosion. TensorFlow handles this efficiently within the model.
upvoted 1 times
...
...
cruise93
1 year, 6 months ago
Selected Answer: C
Agree with b1a8fae
upvoted 1 times
...
gscharly
1 year, 6 months ago
Selected Answer: C
agree with daidai75
upvoted 2 times
pinimichele01
1 year, 6 months ago
Option B is not suitable for the big volume of data processing????? BQ is not suitable for big volume??.. for me is B
upvoted 1 times
...
...
guilhermebutzke
1 year, 8 months ago
Selected Answer: B
My Answer: B 1. Use BigQuery to scale the numerical features.: Simpler and cheaper then use TFX components with Dataflow to scale the numerical features 2. Feed the features into Vertex AI Training. 3. Allow TensorFlow to perform the one-hot text encoding: TensorFlow handles the one-hot text encoding better than BQ.
upvoted 4 times
...
daidai75
1 year, 9 months ago
Selected Answer: C
key messages: "contains hundreds of millions of rows, and includes both categorical and numerical features. You need to use a MaxMin scaler on some numerical features, and apply a one-hot encoding to some categorical features such as SKU names". Option B is not suitable for the big volume of data processing. Option C is better.
upvoted 2 times
...
b1a8fae
1 year, 9 months ago
Selected Answer: C
Inclined to choose C over B. By using TFX components with Dataflow, you can perform feature engineering on large-scale tabular data in a distributed and efficient way. You can use the Transform component to apply the MaxMin scaler and the one-hot encoding to the numerical and categorical features, respectively. You can also use the ExampleGen component to read data from BigQuery and the Trainer component to train your TensorFlow model.
upvoted 2 times
...
pikachu007
1 year, 10 months ago
Selected Answer: B
Option A: Involves creating a separate lookup table and deploying a Hugging Face model in BigQuery, increasing complexity and cost. Option C: While TFX offers robust preprocessing capabilities, it adds overhead for this use case and requires knowledge of Dataflow. Option D: Performing one-hot encoding in BigQuery can be less efficient than TensorFlow's optimized implementation.
upvoted 3 times
...

Topic 1 Question 264

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 264 discussion

You work for a retail company. You have been tasked with building a model to determine the probability of churn for each customer. You need the predictions to be interpretable so the results can be used to develop marketing campaigns that target at-risk customers. What should you do?

  • A. Build a random forest regression model in a Vertex AI Workbench notebook instance. Configure the model to generate feature importances after the model is trained.
  • B. Build an AutoML tabular regression model. Configure the model to generate explanations when it makes predictions.
  • C. Build a custom TensorFlow neural network by using Vertex AI custom training. Configure the model to generate explanations when it makes predictions.
  • D. Build a random forest classification model in a Vertex AI Workbench notebook instance. Configure the model to generate feature importances after the model is trained.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
guilhermebutzke
Highly Voted 1 year, 8 months ago
Selected Answer: B
My Answer: B “the probability of churn for each customer”: the probability is a number. So regression problem. (A,B, C) “predictions to be interpretable”: explainable in predict not in the model (B,C) Choosing between “Build an AutoML tabular regression model” and “Build a custom TensorFlow neural network by using Vertex AI custom training”, I think B could be the most relevant for the problem. However I also think that others no enough information in the text to choose between the two.
upvoted 6 times
...
GCP_ML
Most Recent 7 months, 2 weeks ago
Selected Answer: D
Churn Prediction, therefore classification model not regression. Using softmax or sigmoid you can you can predict probabilities with random forest classification
upvoted 4 times
el_vampiro
2 months ago
Doesnt regression here imply logistic regression and not linear regression?
upvoted 2 times
...
...
VinnyD
10 months, 1 week ago
Selected Answer: B
Model has been asked to provide Probability and this Regression. AutoML models do provide explainability
upvoted 4 times
...
Ankit267
10 months, 2 weeks ago
Selected Answer: D
Churn Prediction, therefore classification model not regression
upvoted 3 times
...
Omi_04040
11 months ago
Selected Answer: D
Recommends constructing a random forest classification model within a Vertex AI Workbench notebook instance and configuring it to generate feature importances post-training, aligns perfectly with the requirement as it correctly identifies the task as a classification problem and offers high interpretability through feature importances, making it the best choice for targeting interventions in a retail context to reduce customer churn.
upvoted 2 times
...
YangG
1 year ago
Selected Answer: B
Probability --> regression model
upvoted 3 times
...
wences
1 year, 1 month ago
Selected Answer: D
Churn probability is required; linear regression will give the label, and classification will provide the likelihood as requested.
upvoted 2 times
...
tardigradum
1 year, 3 months ago
Selected Answer: D
We can't use AutoML due to the lack of explicability. AutoML is a black box, and we can't know which model is GCP using under the hood: Whether is true that you can use the feature importance tool when using AutoML, GCP doesn't publicly disclose the specific models used internally for each type of problem (classification, regression, etc.). AutoML employs a wide range of algorithms, from linear models and decision trees to more complex neural networks. Consequently, the lack of explicability lead us to discard any AutoML option. Regarding the classification/regression discussion, as Roulle says "Churn problems are cases of classification. We don't predict the label, but the probability of belonging to a given class (churn or not). We then set a threshold to indicate the probability at which we can affirm that the person will or will not unsubscribe."
upvoted 3 times
...
Roulle
1 year, 4 months ago
Selected Answer: D
Churn problems are cases of classification. We don't predict the label, but the probability of belonging to a given class (churn or not). We then set a threshold to indicate the probability at which we can affirm that the person will or will not unsubscribe. We can eliminate all responses that mention regression (A & B). A random forest is therefore less complex to interpret than a neural network. So I'm pretty sure it's D
upvoted 4 times
...
gscharly
1 year, 6 months ago
agree with Yan_X. This is a classification problem, so regression should not be used (rule out A&B). Neural networks don't have explainable features by default, and Random Forest provides global explanations...
upvoted 1 times
pinimichele01
1 year, 6 months ago
probability of churn for each customer......
upvoted 1 times
...
...
fitri001
1 year, 6 months ago
Selected Answer: B
Since interpretability is key for your churn prediction model to inform marketing campaigns, --> Choose an interpretable model: Logistic Regression: This is a classic choice for interpretability. It provides coefficients for each feature, indicating how a unit increase in that feature impacts the probability of churn. Easy to understand and implement, it's a good starting point. Decision Trees with Rule Extraction: Decision trees are inherently interpretable, with each branch representing a decision rule. By extracting these rules, you can understand the specific factors leading to churn (e.g., "Customers with low tenure and high number of support tickets are more likely to churn").
upvoted 3 times
...
pinimichele01
1 year, 7 months ago
Selected Answer: B
the probability of churn for each customer -> regression -> B
upvoted 2 times
...
Yan_X
1 year, 7 months ago
I don't know which one is correct... As D is 'after the model is trained', so not for each prediction. And B 'AutoML tabular regression model' is regression, but for not classification problem...
upvoted 2 times
...
sonicclasps
1 year, 9 months ago
Selected Answer: B
the question asks for explainability for predictions, answer D does not provide that. Although not the ideal solution, B is the only answer that suits the requirements, because churn can also be expressed as a probability.
upvoted 1 times
tavva_prudhvi
1 year, 9 months ago
But, in Option B is says "AutoML Regression" if the problem statement is about classification!
upvoted 1 times
...
...
daidai75
1 year, 9 months ago
Selected Answer: D
The answer is D. 1.Churn prediction is a classification problem: We want to categorize customers as either churning or not churning, not predict a continuous value like revenue. Therefore, a classification model is needed. 2.Random forest models are interpretable: Feature importances provide insights into which features contribute most to the model's predictions, making them a good choice for understanding why customers churn. This interpretability is crucial for developing targeted marketing campaigns. 3.Vertex AI Workbench is a suitable platform: It provides notebook instances for building and training models, making it a good choice for this task.
upvoted 3 times
...
shadz10
1 year, 9 months ago
Selected Answer: D
https://cloud.google.com/bigquery/docs/xai-overview
upvoted 2 times
...
pikachu007
1 year, 10 months ago
Selected Answer: D
Option A: Regression, not classification, is used for random forest model, which is not appropriate for predicting probabilities. Option B: While AutoML tabular can generate model explanations, random forests inherently provide more granular insights into feature importance. Option C: Neural networks can be less interpretable than tree-based models, and generating explanations for them often requires additional techniques and libraries.
upvoted 3 times
...

Topic 1 Question 265

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 265 discussion

You work for a company that is developing an application to help users with meal planning. You want to use machine learning to scan a corpus of recipes and extract each ingredient (e.g., carrot, rice, pasta) and each kitchen cookware (e.g., bowl, pot, spoon) mentioned. Each recipe is saved in an unstructured text file. What should you do?

  • A. Create a text dataset on Vertex AI for entity extraction Create two entities called “ingredient” and “cookware”, and label at least 200 examples of each entity. Train an AutoML entity extraction model to extract occurrences of these entity types. Evaluate performance on a holdout dataset.
  • B. Create a multi-label text classification dataset on Vertex AI. Create a test dataset, and label each recipe that corresponds to its ingredients and cookware. Train a multi-class classification model. Evaluate the model’s performance on a holdout dataset.
  • C. Use the Entity Analysis method of the Natural Language API to extract the ingredients and cookware from each recipe. Evaluate the model's performance on a prelabeled dataset.
  • D. Create a text dataset on Vertex AI for entity extraction. Create as many entities as there are different ingredients and cookware. Train an AutoML entity extraction model to extract those entities. Evaluate the model’s performance on a holdout dataset.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
OpenKnowledge
1 month ago
Selected Answer: A
The Google Cloud Natural Language API primarily offers pre-trained models for entity extraction, identifying common entities like people, places, organizations, and events. While it provides robust capabilities for these standard entity types, it is not directly designed for custom entity extraction where you define and train the API on your own specific entity types. For custom entity extraction, you would typically need to use a more specialized service like Google Cloud's AutoML Natural Language. AutoML Natural Language allows you to train custom machine learning models to identify entities unique to your domain or use case by providing your own labeled training data. This enables you to extract entities that the pre-trained Natural Language API models might not recognize.
upvoted 1 times
...
billyst41
1 month, 3 weeks ago
Selected Answer: A
I think I'll go with A. C was what I picked first. From Gemini: Using the standard, pre-trained analyzeEntities method of the Natural Language API alone is not sufficient to extract specific cooking ingredients and cookware from freeform text. While it can identify common nouns, it lacks the specialized domain knowledge to reliably parse the unique structure of recipes.
upvoted 1 times
...
hit_cloudie
5 months, 3 weeks ago
Selected Answer: A
agree with Omi_04040
upvoted 1 times
...
Wuthuong1234
8 months, 1 week ago
Selected Answer: C
The Entity detection in the NLP API will be sufficient to identify ingredients and cookware-related words. It is much easier than training your own model in AutoML. Keep in mind that training on your own dataset could introduce some bias. Imagine your training data might cover many French or western recipes, but suddenly you get lots of Thai recipes in production. Your AutoML model would struggle to correctly identify ingredients that are not so common in western cooking such as lemongrass, kecap manis, kaffir or galangal.
upvoted 3 times
...
andrea_c_
11 months ago
Selected Answer: C
With A you must label a dataset. Since the entities that need to be recognized are pretty common this effort is not justified. Moreover, as specified in https://cloud.google.com/vertex-ai/docs/text-data/entity-extraction/prepare-data, "You must supply at least 1, and no more than 100, unique labels to annotate entities that you want to extract." So, it looks like the dataset has a limit of 100 entities, which I do not think is enough for this use case.
upvoted 2 times
...
Omi_04040
11 months ago
Selected Answer: A
This option involves creating a dataset specifically for entity extraction and training an AutoML model to identify ingredients and cookware. By labeling a minimum of 200 instances for each entity, it ensures a sufficient amount of data for training. Using a holdout dataset for assessment helps evaluate the model's performance. Overall, this approach seems appropriate for the task at hand. Reference: https://cloud.google.com/vertex-ai/docs/text-data/entity-extraction/prepare-data
upvoted 2 times
...
AzureDP900
1 year, 4 months ago
By choosing option A, you can leverage the power of machine learning to efficiently extract ingredients and cookware from recipes in a scalable manner. option C uses the Entity Analysis method of the Natural Language API, which might be a viable option if you had access to the API's pre-trained models. However, since you're working with Vertex AI, creating a dataset for entity extraction is a better choice.
upvoted 1 times
...
fitri001
1 year, 6 months ago
Selected Answer: A
For extracting ingredients and cookware from recipe text files, creating a text dataset on Vertex AI for entity extraction with a custom NER model is the better approach. While it requires more upfront effort for data labeling and training, it offers superior accuracy and control over the types of entities extracted. However, if you need a quick and easy solution to get started, the Natural Language API's Entity Analysis can be a temporary option. Be aware that the accuracy might be lower, and you might need to post-process the results to filter out irrelevant entities.
upvoted 2 times
...
omermahgoub
1 year, 7 months ago
Selected Answer: C
Natural Language API offers a pre-built solution for entity analysis which eliminates the need for custom model training and labeling large datasets, saving time and resources. Vertex AI AutoML can aslo be used for entity extraction but it requires data labeling and training, which can be time-consuming for a vast number of potential ingredients and cookware.
upvoted 3 times
...
guilhermebutzke
1 year, 8 months ago
Selected Answer: A
My Answer: A  A: is the most suitable approach for this task because we need to identify and extract specific named entities ("ingredient" and "cookware") from the text, not classify the entire recipe into predefined categories. B: This approach would require classifying each recipe based on all possible ingredients and cookware, leading to a vast number of classes and potential performance issues. C: This pre-built solution might not be as customizable or scalable as training a specific model for this task. D: This is impractical and unnecessary as the number of potential ingredients and cookware is vast.
upvoted 3 times
...
daidai75
1 year, 9 months ago
I prefer to A. Option C is not the best, because The NLP API is designed to identify general entities within text. While it's effective for broad categories, it may not be as precise for specialized domains like cooking ingredients and cookware, which require a more tailored approach.
upvoted 2 times
...
b1a8fae
1 year, 9 months ago
Selected Answer: A
A. "... you might create an entity extraction model to identify specialized terminology in legal documents or patents." I prefer this over C, which might classify carrot as vegetable, chicken as meat... custom entity extraction allows you to specify what entities you wish to extract from the text.
upvoted 4 times
b1a8fae
1 year, 9 months ago
https://cloud.google.com/vertex-ai/docs/text-data/entity-extraction/prepare-data
upvoted 3 times
...
...
shadz10
1 year, 9 months ago
Selected Answer: C
Reconsidering my answer and going with C Option A involves using AutoML entity extraction, which could be a valid approach. However, for extracting entities like ingredients and cookware, Google Cloud's pre-trained Natural Language API might be a more straightforward solution.
upvoted 1 times
tavva_prudhvi
1 year, 9 months ago
No, A is right as it may not be as effective for this specific task unless the ingredients and cookware are already well-represented within the types of entities the API is trained to recognize. This approach might require less initial setup but could be less accurate for specialized domains like recipes.
upvoted 1 times
...
...
shadz10
1 year, 10 months ago
Selected Answer: A
A is the correct option here
upvoted 2 times
...
pikachu007
1 year, 10 months ago
Selected Answer: C
Option B: Multi-label text classification is less suitable for identifying specific entities within text and would require labeling entire recipes with multiple classes, increasing complexity and reducing model specificity. Option C: Natural Language API's Entity Analysis might not be as accurate for this specialized domain as a model trained on custom recipe data. Option D: Creating separate entities for each ingredient and cookware type would significantly increase labeling effort and potentially hinder model generalization.
upvoted 1 times
kalle_balle
1 year, 10 months ago
do you mean Option A?
upvoted 3 times
...
...

Topic 1 Question 266

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 266 discussion

You work for an organization that operates a streaming music service. You have a custom production model that is serving a “next song” recommendation based on a user's recent listening history. Your model is deployed on a Vertex AI endpoint. You recently retrained the same model by using fresh data. The model received positive test results offline. You now want to test the new model in production while minimizing complexity. What should you do?

  • A. Create a new Vertex AI endpoint for the new model and deploy the new model to that new endpoint. Build a service to randomly send 5% of production traffic to the new endpoint. Monitor end-user metrics such as listening time. If end-user metrics improve between models over time, gradually increase the percentage of production traffic sent to the new endpoint.
  • B. Capture incoming prediction requests in BigQuery. Create an experiment in Vertex AI Experiments. Run batch predictions for both models using the captured data. Use the user’s selected song to compare the models performance side by side. If the new model’s performance metrics are better than the previous model, deploy the new model to production.
  • C. Deploy the new model to the existing Vertex AI endpoint. Use traffic splitting to send 5% of production traffic to the new model. Monitor end-user metrics, such as listening time. If end-user metrics improve between models over time, gradually increase the percentage of production traffic sent to the new model.
  • D. Configure a model monitoring job for the existing Vertex AI endpoint. Configure the monitoring job to detect prediction drift and set a threshold for alerts. Update the model on the endpoint from the previous model to the new model. If you receive an alert of prediction drift, revert to the previous model.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
fitri001
1 year ago
Selected Answer: C
For Simplicity: If speed and simplicity are your top priorities, deploying to the existing endpoint with caution (close monitoring during deployment) can work.--> choose C For Safety and Control: If minimizing risk and having better control over the testing process are more important, creating a new endpoint is the better option. This is generally the recommended approach for most production deployments. --> choose A
upvoted 2 times
...
daidai75
1 year, 3 months ago
Selected Answer: C
Here's why the option C is preferable: Minimized complexity: Leverages existing endpoint: No need to create and manage a new endpoint, reducing setup and maintenance overhead. Traffic splitting readily available: Vertex AI provides built-in traffic splitting functionality, simplifying traffic distribution. Efficient testing and monitoring: Direct comparison: Sending a percentage of traffic to the new model allows for direct comparison with the current model's performance on real user data. Gradual rollout: Starting with a small percentage mitigates potential risks and allows for gradual transition based on observed improvements. End-user metric monitoring: Focusing on metrics like listening time directly reflects user engagement and preference for the new recommendations.
upvoted 4 times
...
b1a8fae
1 year, 3 months ago
Selected Answer: C
Traffic splitting is a feature of Vertex AI that allows you to distribute the prediction requests among multiple models or model versions within the same endpoint. You can specify the percentage of traffic that each model or model version receives, and change it at any time. Traffic splitting can help you test the new model in production without creating a new endpoint or a separate service. You can deploy the new model to the existing Vertex AI endpoint, and use traffic splitting to send 5% of production traffic to the new model. You can monitor the end-user metrics, such as listening time, to compare the performance of the new model and the previous model. If the end-user metrics improve between models over time, you can gradually increase the percentage of production traffic sent to the new model. This solution can help you test the new model in production while minimizing complexity and cost.
upvoted 2 times
...
pikachu007
1 year, 4 months ago
Selected Answer: C
Option A: Building a separate service adds unnecessary complexity and requires managing two endpoints. Option B: Batch predictions in Vertex AI Experiments might not reflect real-time user behavior and don't directly affect the production environment. Option D: Model monitoring alerts for prediction drift might be triggered by natural variations in user behavior instead of genuine performance issues and could lead to unnecessary model rollbacks.
upvoted 2 times
...

Topic 1 Question 267

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 267 discussion

You created a model that uses BigQuery ML to perform linear regression. You need to retrain the model on the cumulative data collected every week. You want to minimize the development effort and the scheduling cost. What should you do?

  • A. Use BigQuery’s scheduling service to run the model retraining query periodically.
  • B. Create a pipeline in Vertex AI Pipelines that executes the retraining query, and use the Cloud Scheduler API to run the query weekly.
  • C. Use Cloud Scheduler to trigger a Cloud Function every week that runs the query for retraining the model.
  • D. Use the BigQuery API Connector and Cloud Scheduler to trigger Workflows every week that retrains the model.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
daidai75
Highly Voted 1 year, 9 months ago
Selected Answer: A
No additional setup: BigQuery's scheduling feature is built-in, eliminating the need to create pipelines, functions, or workflows. Straightforward configuration: Setting up a schedule for a query is a simple process within the BigQuery interface.
upvoted 7 times
...
AzureDP900
Most Recent 1 year, 4 months ago
A is right Using BigQuery's scheduling service allows you to automate the retraining process without needing to write custom code or manage additional dependencies.
upvoted 2 times
...
b1a8fae
1 year, 9 months ago
Selected Answer: A
No-brainer A.
upvoted 3 times
...
pikachu007
1 year, 10 months ago
Selected Answer: A
Option B: Vertex AI Pipelines offer flexibility for complex workflows, but it involves more development effort and potential costs for pipeline execution. Option C: Cloud Functions provide a serverless way to execute code, but they incur execution costs and require additional configuration for triggering and permissions. Option D: Workflows can manage complex orchestration, but configuring the BigQuery API Connector and Cloud Scheduler adds complexity and potential costs.
upvoted 3 times
...

Topic 1 Question 268

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 268 discussion

You want to migrate a scikit-learn classifier model to TensorFlow. You plan to train the TensorFlow classifier model using the same training set that was used to train the scikit-learn model, and then compare the performances using a common test set. You want to use the Vertex AI Python SDK to manually log the evaluation metrics of each model and compare them based on their F1 scores and confusion matrices. How should you log the metrics?

  • A. Use the aiplatform.log_classification_metrics function to log the F1 score, and use the aiplatform.log_metrics function to log the confusion matrix.
  • B. Use the aiplatform.log_classification_metrics function to log the F1 score and the confusion matrix.
  • C. Use the aiplatform.log_metrics function to log the F1 score and the confusion matrix.
  • D. Use the aiplatform.log_metrics function to log the F1 score: and use the aiplatform.log_classification_metrics function to log the confusion matrix.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
b1a8fae
Highly Voted 1 year, 9 months ago
Selected Answer: D
I go with D. log_classification_metrics currently support confusion matrix and ROC curve. https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform#google_cloud_aiplatform_log_classification_metrics Because it is not explicitly mentioned in the docs of log_classification_metrics, I assume F1 Score must be logged with log_metrics. https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform#google_cloud_aiplatform_log_metrics (if accuracy and recall are logged in the example, probably F1 is done the same way)
upvoted 7 times
...
fitri001
Highly Voted 1 year, 6 months ago
Selected Answer: B
aiplatform.log_classification_metrics is specifically designed for logging classification metrics, which includes F1 score and confusion matrix. aiplatform.log_metrics is a more generic function for logging any kind of metric, but it wouldn't capture the rich structure of a confusion matrix. Therefore, using aiplatform.log_classification_metrics allows you to log both F1 score and confusion matrix in a single call, simplifying your code and ensuring proper handling of these classification-specific metrics.
upvoted 5 times
fitri001
1 year, 6 months ago
While aiplatform.log_metrics can handle numeric values like F1 score, it wouldn't capture the complexity of a confusion matrix. Confusion matrix is a two-dimensional table and requires specific handling for proper logging.expand_more aiplatform.log_classification_metrics is designed for classification tasks and understands the structure of both F1 score and confusion matrix, allowing them to be logged efficiently in a single function call.
upvoted 2 times
fitri001
1 year, 6 months ago
Therefore, using separate functions like log_metrics for F1 score and log_classification_metrics for confusion matrix would be inefficient and might not capture the matrix structure accurately.
upvoted 2 times
tardigradum
1 year, 3 months ago
Hi fitri001. You are usually right but, I this particular case, I think D is the right answer. As you can see here in the link I provide you below, it "Currently support confusion matrix and ROC curve." Link: https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform#google_cloud_aiplatform_log_classification_metrics
upvoted 2 times
...
...
...
pinimichele01
1 year, 6 months ago
https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform#google_cloud_aiplatform_log_classification_metrics
upvoted 1 times
...
...
OpenKnowledge
Most Recent 3 weeks, 5 days ago
Selected Answer: D
F1 score is not a classification matrix. So use aiplatform.log_metrics for logging F1 score
upvoted 1 times
...
qaz09
4 months ago
Selected Answer: B
In this example linked below you can see that they use log_classification_metrics and the output contains both F1 score and confusion matrix. Hence I am voting for B. https://colab.research.google.com/github/whylabs/whylogs/blob/mainline/python/examples/integrations/writers/Writing_Classification_Performance_Metrics_to_WhyLabs.ipynb
upvoted 2 times
...
Omi_04040
11 months ago
Selected Answer: D
D. Utilize the aiplatform.log_metrics function to log the F1 score, and employ the aiplatform.log_classification_metrics function to log the confusion matrix. Utilize the aiplatform.log_metrics function to log the F1 score, and employ the aiplatform.log_classification_metrics function to log the confusion matrix. This is the correct approach. aiplatform.log_metrics is appropriate for logging general metrics such as the F1 score, and aiplatform.log_classification_metrics is ideal for logging classification-specific metrics like the confusion matrix. Reference: https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform#google_cloud_aiplatform_log_classification_metrics
upvoted 4 times
...
rajshiv
11 months, 2 weeks ago
Selected Answer: B
I think we can log both metrics together. D is a close second but B seems to be a better answer
upvoted 1 times
...
YangG
1 year ago
Selected Answer: D
d
upvoted 2 times
...
bobjr
1 year, 5 months ago
Selected Answer: D
https://cloud.google.com/vertex-ai/docs/experiments/log-data#classification-metrics log_classification_metrics -> only the confusion matrix, not the F1scores log_metrics -> any number you want -> you can use it to store a F1 scores
upvoted 4 times
...
gscharly
1 year, 6 months ago
Selected Answer: D
According to docs, log_classification_metrics supports confusion matrix and ROC curve. Not sure if it means that it only supports those... Assuming those are the only ones supported, I would got with D
upvoted 3 times
gscharly
1 year, 6 months ago
forgot to add the link: https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform#google_cloud_aiplatform_log_classification_metrics
upvoted 2 times
...
...
omermahgoub
1 year, 7 months ago
Selected Answer: B
aiplatform.log_classification_metrics to log metrics relevant to classification tasks, including F1 score and confusion matrix.
upvoted 1 times
pinimichele01
1 year, 6 months ago
link?? i find only: https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform#google_cloud_aiplatform_log_classification_metrics so D NOT B
upvoted 1 times
...
...
Yan_X
1 year, 8 months ago
Selected Answer: B
The aiplatform.log_classification_metrics function is designed to log classification metrics, including the F1 score and the confusion matrix. It takes the following arguments: predictions: The predicted labels. labels: The true labels. weight: The weight of each sample. logger: The logger to use. ---------------------------- The aiplatform.log_metrics function is designed to log general metrics, such as accuracy, loss, and precision. It takes the following arguments: metric: The metric to log. value: The value of the metric. step: The step at which the metric was logged. logger: The logger to use.
upvoted 1 times
...
daidai75
1 year, 9 months ago
Selected Answer: B
Actually, the F1 score is calculated by the Precision and recall metrics. The the log_classification_metrics is OK for both confusion matrix and F1 score
upvoted 2 times
...
pikachu007
1 year, 10 months ago
Selected Answer: B
Option A: It's incorrect because aiplatform.log_metrics is a more general function that doesn't provide the same specialized structure for classification metrics. Option C: While technically possible to log both metrics using aiplatform.log_metrics, it's less optimal as it requires manual formatting and might not be as easily interpreted by Vertex AI's visualization tools. Option D: This is incorrect as it suggests using aiplatform.log_classification_metrics for the confusion matrix, but that function doesn't support logging confusion matrices directly.
upvoted 1 times
b1a8fae
1 year, 9 months ago
Option B also suggests sing aiplatform.log_classification_metrics for the confusion matrix. Which is supported, btw. https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform#google_cloud_aiplatform_log_classification_metrics
upvoted 4 times
...
...

Topic 1 Question 269

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 269 discussion

You are developing a model to help your company create more targeted online advertising campaigns. You need to create a dataset that you will use to train the model. You want to avoid creating or reinforcing unfair bias in the model. What should you do? (Choose two.)

  • A. Include a comprehensive set of demographic features
  • B. Include only the demographic groups that most frequently interact with advertisements
  • C. Collect a random sample of production traffic to build the training dataset
  • D. Collect a stratified sample of production traffic to build the training dataset
  • E. Conduct fairness tests across sensitive categories and demographics on the trained model
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
AB_C
Highly Voted 11 months, 2 weeks ago
Selected Answer: E
D&E - right
upvoted 5 times
...
wences
Most Recent 1 year, 1 month ago
Selected Answer: D
From my statistical point of view, D and E will mitigate the effect of bias.
upvoted 3 times
...
AzureDP900
1 year, 4 months ago
D and E is right answer, question asks us to select 2 right answers • To avoid creating or reinforcing unfair bias in the model, you should collect a representative and diverse dataset (option D) that includes a stratified sample of production traffic. This ensures that your training data is inclusive and accurately represents the diversity of your target audience. • Once you have collected your training dataset, you should conduct fairness tests across sensitive categories and demographics on the trained model (option E). This involves evaluating whether the model treats different demographic groups fairly and without bias. If biases are detected, you can take steps to mitigate them and ensure that your model is fair and accurate.
upvoted 2 times
...
AzureDP900
1 year, 4 months ago
D and E is right answer, question asks us to select 2 right answers • To avoid creating or reinforcing unfair bias in the model, you should collect a representative and diverse dataset (option D) that includes a stratified sample of production traffic. This ensures that your training data is inclusive and accurately represents the diversity of your target audience. • Once you have collected your training dataset, you should conduct fairness tests across sensitive categories and demographics on the trained model (option E). This involves evaluating whether the model treats different demographic groups fairly and without bias. If biases are detected, you can take steps to mitigate them and ensure that your model is fair and accurate.
upvoted 3 times
...
dija123
1 year, 4 months ago
Selected Answer: D
Agree with D and E
upvoted 1 times
...
omermahgoub
1 year, 7 months ago
Selected Answer: D
D. Stratified sampling to ensure the different demographic groups or categories are proportionally represented in the training data. This helps mitigate bias that might arise if certain groups are under-represented. E. Fairness tests can reveal disparities in how the model treats different populations, allowing you to identify and address potential biases.
upvoted 4 times
...
MultiCloudIronMan
1 year, 7 months ago
Selected Answer: D
D and E is the two answers. Two selections are required
upvoted 2 times
pinimichele01
1 year, 7 months ago
why not D and A?
upvoted 2 times
...
...
CHARLIE2108
1 year, 8 months ago
Selected Answer: D
I went D, E
upvoted 2 times
...
guilhermebutzke
1 year, 8 months ago
Selected Answer: D
DE D. Collect a stratified sample of production traffic to build the training dataset: This ensures that the training data represents the diverse demographics that will be targeted by the advertising campaigns. Random sampling might unintentionally underrepresent certain groups, leading to biased model outputs. E. Conduct fairness tests across sensitive categories and demographics on the trained model: This allows you to identify and address any potential biases that may have emerged during the training process. Evaluating the model's performance on different groups helps ensure fair and responsible deployment.
upvoted 1 times
...
daidai75
1 year, 9 months ago
Selected Answer: D
I go for D & E: A stratified sample ensures that the training data represents the distribution of the target population across relevant demographics or other sensitive categories. This helps mitigate bias arising from underrepresented groups in the data. Regularly testing the model for fairness across sensitive categories helps identify and address potential bias issues before deploying the model in production. This can involve metrics like precision, recall, and F1 score for different demographic groups.
upvoted 1 times
...
b1a8fae
1 year, 9 months ago
Selected Answer: D
D E. ChatGPT explanation below (but I think makes quite a lot of sense) Collect a Stratified Sample (Option D): Stratified sampling involves dividing the population into subgroups (strata) and then randomly sampling from each subgroup. This ensures that the training dataset represents the diversity of the population, helping to avoid biases. By collecting a stratified sample of production traffic, you are more likely to have a balanced representation of different demographic groups, reducing the risk of biased model outcomes. Conduct Fairness Tests (Option E): After training the model, it's crucial to conduct fairness tests to evaluate its performance across different sensitive categories and demographics. This involves measuring the model's predictions and outcomes for various groups to identify any disparities. Fairness tests help you assess and address biases that may have been inadvertently introduced during the training process.
upvoted 3 times
...
shadz10
1 year, 9 months ago
Selected Answer: C
C, D - Conducting fairness tests across sensitive categories and demographics on the trained model is indeed important. However, this option focuses on post-training analysis rather than dataset creation. While it's a crucial step for ensuring fairness, it doesn't directly address how to create a training dataset to avoid bias. Hence C,D
upvoted 1 times
tavva_prudhvi
1 year, 9 months ago
Check b1a8fae comment on why D is better than C!
upvoted 1 times
...
...
pikachu007
1 year, 10 months ago
Selected Answer: D
D. Stratified Sampling: Randomly sampling your data might not accurately represent the diversity of your target audience, potentially introducing bias by over- or under-representing certain demographics. Stratified sampling ensures your training dataset reflects the distribution of sensitive features (e.g., age, gender, income) observed in your production traffic, helping mitigate bias during model training. E. Fairness Testing: Simply collecting unbiased data isn't enough. Regularly testing your trained model for fairness across sensitive categories is crucial. This involves measuring and analyzing metrics like accuracy, precision, recall, and F1 score for different demographic groups. Identifying disparities in performance can trigger further investigation and potential re-training to address bias.
upvoted 2 times
...

Topic 1 Question 270

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 270 discussion

You are developing an ML model in a Vertex AI Workbench notebook. You want to track artifacts and compare models during experimentation using different approaches. You need to rapidly and easily transition successful experiments to production as you iterate on your model implementation. What should you do?

  • A. 1. Initialize the Vertex SDK with the name of your experiment. Log parameters and metrics for each experiment, and attach dataset and model artifacts as inputs and outputs to each execution.
    2. After a successful experiment create a Vertex AI pipeline.
  • B. 1. Initialize the Vertex SDK with the name of your experiment. Log parameters and metrics for each experiment, save your dataset to a Cloud Storage bucket, and upload the models to Vertex AI Model Registry.
    2. After a successful experiment, create a Vertex AI pipeline.
  • C. 1. Create a Vertex AI pipeline with parameters you want to track as arguments to your PipelineJob. Use the Metrics, Model, and Dataset artifact types from the Kubeflow Pipelines DSL as the inputs and outputs of the components in your pipeline.
    2. Associate the pipeline with your experiment when you submit the job.
  • D. 1. Create a Vertex AI pipeline. Use the Dataset and Model artifact types from the Kubeflow Pipelines DSL as the inputs and outputs of the components in your pipeline.
    2. In your training component, use the Vertex AI SDK to create an experiment run. Configure the log_params and log_metrics functions to track parameters and metrics of your experiment.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
pikachu007
Highly Voted 1 year, 10 months ago
Selected Answer: A
Option B: Manually saving datasets and models to Cloud Storage and Model Registry introduces extra steps and potential for inconsistencies. Options C and D: Prioritizing pipeline creation limits flexibility and visibility during the experimentation phase, making it harder to track artifacts and compare models effectively.
upvoted 8 times
...
5091a99
Most Recent 8 months ago
Selected Answer: A
A. Model and Data artifacts allow you to retrieve the model and the data, so you don't need to explicitly store it separately. That's overkill.
upvoted 3 times
...
rajshiv
11 months, 2 weeks ago
Selected Answer: B
A does not specify where to store the model. I agree with bobjr
upvoted 2 times
...
AzureDP900
1 year, 4 months ago
Option A correctly describes how to rapidly and easily transition successful experiments to production by initializing the Vertex SDK with the experiment name, logging parameters and metrics, and attaching dataset and model artifacts. The second step of creating a Vertex AI pipeline after a successful experiment allows for easy iteration on the model implementation while maintaining track of the experiment's performance.
upvoted 1 times
...
bobjr
1 year, 5 months ago
Selected Answer: B
Answer B leverages more tools for responsability splitting : they are still tools for early experiments, but would help in the pipeline creation. C & D are overkill
upvoted 2 times
...
guilhermebutzke
1 year, 8 months ago
Selected Answer: A
I agree with these comments >> I will go for A, because the requirement is "rapidly and easily" >> B: Manually saving datasets and models to Cloud Storage and Model Registry introduces extra steps and potential for inconsistencies. >> Options C and D: Prioritizing pipeline creation limits flexibility and visibility during the experimentation phase, making it harder to track artifacts and compare models effectively.
upvoted 4 times
...
daidai75
1 year, 9 months ago
Selected Answer: A
I will go for A, because the requirement is "rapidly and easily" transition successful experiments to production. Option B,C,D are too complex to conduct.
upvoted 3 times
...
b1a8fae
1 year, 9 months ago
Selected Answer: A
I believe is A for the same reasons that pikachu.
upvoted 2 times
...

Topic 1 Question 271

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 271 discussion

You recently created a new Google Cloud project. After testing that you can submit a Vertex AI Pipeline job from the Cloud Shell, you want to use a Vertex AI Workbench user-managed notebook instance to run your code from that instance. You created the instance and ran the code but this time the job fails with an insufficient permissions error. What should you do?

  • A. Ensure that the Workbench instance that you created is in the same region of the Vertex AI Pipelines resources you will use.
  • B. Ensure that the Vertex AI Workbench instance is on the same subnetwork of the Vertex AI Pipeline resources that you will use.
  • C. Ensure that the Vertex AI Workbench instance is assigned the Identity and Access Management (IAM) Vertex AI User role.
  • D. Ensure that the Vertex AI Workbench instance is assigned the Identity and Access Management (IAM) Notebooks Runner role.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
bobjr
11 months, 1 week ago
Selected Answer: C
The job fails, not the access to notebook
upvoted 2 times
...
fitri001
1 year ago
Selected Answer: C
Vertex AI has its own set of specific roles that control access to resources within the Vertex AI platform itself, such as datasets, models, and endpoints. The Vertex AI Notebook Runner falls under this category
upvoted 3 times
...
omermahgoub
1 year ago
Selected Answer: C
The insufficient permissions error suggests your instance lacks the required authorization to access Vertex AI Pipelines resources.
upvoted 4 times
...
Yan_X
1 year, 2 months ago
Selected Answer: C
The question is asking 'submit a Vertex AI Pipeline job', so not just simply run notebooks on Vertex AI Workbench. The role required should be 'IAM Vertex AI User role'. So it is C.
upvoted 3 times
...
daidai75
1 year, 3 months ago
Selected Answer: D
I have done the test, it is D
upvoted 1 times
...
b1a8fae
1 year, 3 months ago
Selected Answer: C
I decided to change my mind to C after realizing we need the permissions aiplatform.pipelineJobs, present in vertex AI user. Not sure if the notebook runner role allows to run notebook from pipeline jobs + its specified that it only is allowed to run scheduled notebooks (no mention of scheduling here anywhere)
upvoted 3 times
...
b1a8fae
1 year, 3 months ago
Selected Answer: D
I say D. You want to run the code, that's your purpose, and you have insufficient permissions, so all the permissions you need to solve this problem is: being able to run the notebook. Plus, what is a "AI user role"? It is not a predefined role according to the docs: https://cloud.google.com/vertex-ai/docs/workbench/user-managed/iam#iam_roles
upvoted 1 times
b1a8fae
1 year, 3 months ago
Apparently "Vertex AI user role" is indeed a thing. I just did not see this link: https://cloud.google.com/vertex-ai/docs/general/access-control#predefined-roles. My point remains: not being able to run the code seems to be the inconvenient here.
upvoted 1 times
...
...
pikachu007
1 year, 4 months ago
Selected Answer: C
A. Region Compatibility: While regional compatibility is important, it's not the primary cause of this permission error. B. Subnet Matching: Subnet alignment is usually not a requirement for Vertex AI pipeline job submission. D. Notebooks Runner Role: This role is primarily for executing notebook code, not managing Vertex AI resources.
upvoted 3 times
...

Topic 1 Question 272

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 272 discussion

You work for a semiconductor manufacturing company. You need to create a real-time application that automates the quality control process. High-definition images of each semiconductor are taken at the end of the assembly line in real time. The photos are uploaded to a Cloud Storage bucket along with tabular data that includes each semiconductor’s batch number, serial number, dimensions, and weight. You need to configure model training and serving while maximizing model accuracy. What should you do?

  • A. Use Vertex AI Data Labeling Service to label the images, and tram an AutoML image classification model. Deploy the model, and configure Pub/Sub to publish a message when an image is categorized into the failing class.
  • B. Use Vertex AI Data Labeling Service to label the images, and train an AutoML image classification model. Schedule a daily batch prediction job that publishes a Pub/Sub message when the job completes.
  • C. Convert the images into an embedding representation. Import this data into BigQuery, and train a BigQuery ML K-means clustering model with two clusters. Deploy the model and configure Pub/Sub to publish a message when a semiconductor’s data is categorized into the failing cluster.
  • D. Import the tabular data into BigQuery, use Vertex AI Data Labeling Service to label the data and train an AutoML tabular classification model. Deploy the model, and configure Pub/Sub to publish a message when a semiconductor’s data is categorized into the failing class.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
omermahgoub
Highly Voted 1 year, 7 months ago
Selected Answer: A
Real-time Processing, uploading images to Cloud Storage triggers the AutoML image classification model for immediate processing, enabling real-time quality control decisions. Image Classification, the scenario focuses on classifying images as "passing" or "failing" quality, making image classification the appropriate approach. Pub/Sub Notifications, Pub/Sub messaging efficiently alerts downstream systems about failing classifications, allowing for prompt quality control actions.
upvoted 5 times
...
OpenKnowledge
Most Recent 3 weeks, 5 days ago
Selected Answer: A
Option A provides real-time image classification and quality control
upvoted 1 times
...
AzureDP900
1 year, 4 months ago
Option A is correct The high-definition images of each semiconductor are taken in real-time at the end of the assembly line. The images are uploaded to Cloud Storage along with tabular data that includes batch number, serial number, dimensions, and weight. You need to configure model training and serving while maximizing model accuracy.
upvoted 2 times
...
b1a8fae
1 year, 9 months ago
Selected Answer: A
I go with A.
upvoted 3 times
...
pikachu007
1 year, 10 months ago
Selected Answer: D
Option B: Batch prediction jobs introduce latency, making them unsuitable for real-time quality control. Option C: K-means clustering is an unsupervised learning technique that doesn't leverage labeled data to distinguish between passing and failing semiconductors, potentially compromising accuracy. Option D: Tabular classification focuses on structured data, not images, and might overlook visual defects captured in the photos.
upvoted 1 times
daidai75
1 year, 9 months ago
I am afraid the option D is not correct, since this is a image classification task.
upvoted 1 times
...
pikachu007
1 year, 10 months ago
The answer should be A*
upvoted 3 times
...
...
daidai75
1 year, 10 months ago
Selected Answer: A
The right answer should be A
upvoted 3 times
...

Topic 1 Question 273

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 273 discussion

You work for a rapidly growing social media company. Your team builds TensorFlow recommender models in an on-premises CPU cluster. The data contains billions of historical user events and 100,000 categorical features. You notice that as the data increases, the model training time increases. You plan to move the models to Google Cloud. You want to use the most scalable approach that also minimizes training time. What should you do?

  • A. Deploy the training jobs by using TPU VMs with TPUv3 Pod slices, and use the TPUEmbeading API
  • B. Deploy the training jobs in an autoscaling Google Kubernetes Engine cluster with CPUs
  • C. Deploy a matrix factorization model training job by using BigQuery ML
  • D. Deploy the training jobs by using Compute Engine instances with A100 GPUs, and use the tf.nn.embedding_lookup API
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
daidai75
Highly Voted 1 year, 4 months ago
Selected Answer: A
TPU (Tensor Processing Units) VMs are specialized hardware accelerators designed by Google specifically for machine learning tasks. TPUv3 Pod slices offer high scalability and are excellent for distributed training tasks. The TPUEmbedding API is optimized for handling large volumes of categorical features, which fits your scenario with 100,000 categorical features. This option is likely to offer the fastest training times due to specialized hardware and optimized APIs for large-scale machine learning tasks.
upvoted 9 times
...
omermahgoub
Highly Voted 1 year ago
Selected Answer: A
Addressing Bottleneck: As data size increases, CPU-based training becomes increasingly slow. TPUs are specifically designed to address this challenge, significantly accelerating training. Large Categorical Features: TPUEmbedding API efficiently handles embedding lookups for a vast number of categorical features, a common characteristic of recommender system data.
upvoted 5 times
...
JG123
Most Recent 1 year, 2 months ago
Option C
upvoted 1 times
...
guilhermebutzke
1 year, 2 months ago
Selected Answer: A
My Answer: A: most scalable approach that also minimizes training time: TPU using TPUEmbeading API https://www.tensorflow.org/api_docs/python/tf/tpu/experimental/embedding/TPUEmbedding
upvoted 2 times
...

Topic 1 Question 274

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 274 discussion

You are training and deploying updated versions of a regression model with tabular data by using Vertex AI Pipelines, Vertex AI Training, Vertex AI Experiments, and Vertex AI Endpoints. The model is deployed in a Vertex AI endpoint, and your users call the model by using the Vertex AI endpoint. You want to receive an email when the feature data distribution changes significantly, so you can retrigger the training pipeline and deploy an updated version of your model. What should you do?

  • A. Use Vertex Al Model Monitoring. Enable prediction drift monitoring on the endpoint, and specify a notification email.
  • B. In Cloud Logging, create a logs-based alert using the logs in the Vertex Al endpoint. Configure Cloud Logging to send an email when the alert is triggered.
  • C. In Cloud Monitoring create a logs-based metric and a threshold alert for the metric. Configure Cloud Monitoring to send an email when the alert is triggered.
  • D. Export the container logs of the endpoint to BigQuery. Create a Cloud Function to run a SQL query over the exported logs and send an email. Use Cloud Scheduler to trigger the Cloud Function.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
CHARLIE2108
1 year, 3 months ago
Selected Answer: A
I went with A
upvoted 3 times
...
daidai75
1 year, 3 months ago
Selected Answer: A
Vertex AI Model Monitoring is specifically designed for this purpose and provides out-of-the-box functionality for monitoring the data distribution of your model's predictions. It can automatically detect drift and trigger alerts based on predefined thresholds, making it the most efficient and straightforward solution. Option B,C and D are either over complex or too many manual operations.
upvoted 2 times
...
b1a8fae
1 year, 3 months ago
Selected Answer: A
https://cloud.google.com/blog/topics/developers-practitioners/monitor-models-training-serving-skew-vertex-ai
upvoted 1 times
...
36bdc1e
1 year, 3 months ago
A Prediction drift is the change in the distribution of feature values or labels over time.
upvoted 1 times
...
pikachu007
1 year, 4 months ago
Selected Answer: A
Options B and C: While Cloud Logging and Cloud Monitoring can be used for general monitoring, they don't have the same specialized focus on prediction drift, potentially requiring more complex setup and analysis. Option D: Exporting logs to BigQuery and creating a Cloud Function for analysis can be time-consuming and less efficient compared to Vertex AI Model Monitoring's out-of-the-box capabilities.
upvoted 1 times
...

Topic 1 Question 275

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 275 discussion

You have trained an XGBoost model that you plan to deploy on Vertex AI for online prediction. You are now uploading your model to Vertex AI Model Registry, and you need to configure the explanation method that will serve online prediction requests to be returned with minimal latency. You also want to be alerted when feature attributions of the model meaningfully change over time. What should you do?

  • A. 1. Specify sampled Shapley as the explanation method with a path count of 5.
    2. Deploy the model to Vertex AI Endpoints.
    3. Create a Model Monitoring job that uses prediction drift as the monitoring objective.
  • B. 1. Specify Integrated Gradients as the explanation method with a path count of 5.
    2. Deploy the model to Vertex AI Endpoints.
    3. Create a Model Monitoring job that uses prediction drift as the monitoring objective.
  • C. 1. Specify sampled Shapley as the explanation method with a path count of 50.
    2. Deploy the model to Vertex AI Endpoints.
    3. Create a Model Monitoring job that uses training-serving skew as the monitoring objective.
  • D. 1. Specify Integrated Gradients as the explanation method with a path count of 50.
    2. Deploy the model to Vertex AI Endpoints.
    3. Create a Model Monitoring job that uses training-serving skew as the monitoring objective.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
36bdc1e
Highly Voted 1 year, 3 months ago
A Sampled Shapley is a fast and scalable approximation of the Shapley value, which is a game-theoretic concept that measures the contribution of each feature to the model prediction. Sampled Shapley is suitable for online prediction requests, as it can return feature attributions with minimal latency. The path count parameter controls the number of samples used to estimate the Shapley value, and a lower value means faster computation. Integrated Gradients is another explanation method that computes the average gradient along the path from a baseline input to the actual input. Integrated Gradients is more accurate than Sampled Shapley, but also more computationally intensive
upvoted 5 times
...
pikachu007
Highly Voted 1 year, 4 months ago
Selected Answer: A
Explanation Method: Sampled Shapley: This method provides high-fidelity feature attributions while being computationally efficient, making it ideal for low-latency online predictions. Integrated Gradients: While also accurate, it's generally more computationally intensive than sampled Shapley, potentially introducing latency. Path Count: Lower Path Count (5): Reducing path count further decreases computation time, optimizing for faster prediction responses. Monitoring Objective: Prediction Drift: This type of monitoring detects changes in feature importance over time, aligning with the goal of tracking feature attribution shifts. Training-Serving Skew: This monitors discrepancies between training and serving data distributions, which isn't directly related to feature attributions.
upvoted 5 times
...
daidai75
Most Recent 1 year, 3 months ago
Selected Answer: A
Sampled Shapley is a method suitable for XGBoost models. A lower path count (like 5) would indeed ensure lower latency in explanations, but might compromise on the precision of the explanations.Model Monitoring - Prediction Drift: This monitors the change in model predictions over time, which can indirectly indicate a change in feature attributions, but it's not directly monitoring the attributions themselves.
upvoted 4 times
...
shadz10
1 year, 3 months ago
Selected Answer: A
not B as integrated gradients is only for Custom-trained TensorFlow models that use a TensorFlow prebuilt container to serve predictions and AutoML image models
upvoted 4 times
shadz10
1 year, 3 months ago
https://cloud.google.com/vertex-ai/docs/explainable-ai/overview
upvoted 1 times
...
...

Topic 1 Question 276

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 276 discussion

You work at a gaming startup that has several terabytes of structured data in Cloud Storage. This data includes gameplay time data, user metadata, and game metadata. You want to build a model that recommends new games to users that requires the least amount of coding. What should you do?

  • A. Load the data in BigQuery. Use BigQuery ML to train an Autoencoder model.
  • B. Load the data in BigQuery. Use BigQuery ML to train a matrix factorization model.
  • C. Read data to a Vertex AI Workbench notebook. Use TensorFlow to train a two-tower model.
  • D. Read data to a Vertex AI Workbench notebook. Use TensorFlow to train a matrix factorization model.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
omermahgoub
1 year ago
Selected Answer: A
Minimal Coding: BigQuery ML provides a user-friendly interface for training models, minimizing the need for extensive coding in tools like TensorFlow (C & D) Efficient Data Processing: Training directly in BigQuery eliminates data movement and leverages BigQuery's scalable infrastructure.
upvoted 1 times
omermahgoub
1 year ago
Matrix Factorization: This collaborative filtering technique is commonly used for recommender systems. BigQuery ML offers built-in support for matrix factorization, making it a good choice for your scenario.
upvoted 3 times
fitri001
1 year ago
it means you choose B?
upvoted 1 times
omermahgoub
1 year ago
Yes, voted for A by mistake. The answer is B
upvoted 4 times
...
...
...
...
vaibavi
1 year, 2 months ago
Selected Answer: B
least amount of coding--> BQML recommendations--> matrix factorization
upvoted 4 times
...
guilhermebutzke
1 year, 2 months ago
Selected Answer: B
Using BigQuery ML for training a matrix factorization model would require less coding compared to building a custom model with TensorFlow in a Vertex AI Workbench notebook. BigQuery ML provides high-level APIs for machine learning tasks directly within the BigQuery environment, thus reducing the amount of coding needed for data preprocessing and model training. Matrix factorization is a commonly used technique for recommendation systems, making it a suitable choice for recommending new games to users based on their gameplay time data, user metadata, and game metadata.
upvoted 4 times
...
Yan_X
1 year, 3 months ago
Selected Answer: B
B https://developers.google.com/machine-learning/recommendation/collaborative/matrix
upvoted 3 times
...

Topic 1 Question 277

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 277 discussion

You work for a large bank that serves customers through an application hosted in Google Cloud that is running in the US and Singapore. You have developed a PyTorch model to classify transactions as potentially fraudulent or not. The model is a three-layer perceptron that uses both numerical and categorical features as input, and hashing happens within the model.

You deployed the model to the us-central1 region on nl-highcpu-16 machines, and predictions are served in real time. The model's current median response latency is 40 ms. You want to reduce latency, especially in Singapore, where some customers are experiencing the longest delays. What should you do?

  • A. Attach an NVIDIA T4 GPU to the machines being used for online inference.
  • B. Change the machines being used for online inference to nl-highcpu-32.
  • C. Deploy the model to Vertex AI private endpoints in the us-central1 and asia-southeast1 regions, and allow the application to choose the appropriate endpoint.
  • D. Create another Vertex AI endpoint in the asia-southeast1 region, and allow the application to choose the appropriate endpoint.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
guilhermebutzke
Highly Voted 1 year, 8 months ago
Selected Answer: C
My Answer: C The bottleneck is network latency. So, A: Not Correct: might improve performance, but it's an expensive solution and may not be necessary if the bottleneck is network latency. B: Not Correct: might offer slight improvement, but the primary issue is geographical distance between users and the model. C: CORRECT: This approach leverages the geographical proximity of the endpoints to the users, reducing latency for customers in Singapore without neglecting customers in the US. Additionally, using Vertex AI private endpoints ensures secure and efficient communication between the application and the model. D: Not Correct: it's not the most efficient approach because it does not utilize the existing infrastructure in the us-central1 region, and managing multiple endpoints might introduce additional complexity.
upvoted 10 times
tavva_prudhvi
1 year, 7 months ago
Deploying in additional regions (D) does not necessarily negate or underutilize existing deployments but rather complements them to provide a better global service.
upvoted 7 times
...
...
f084277
Highly Voted 12 months ago
Selected Answer: C
You work for a BANK. An application accesses the model and serves predictions to customers. The application is in US and Singapore. The application should never access the model over the public internet. Therefore, private endpoints.
upvoted 5 times
...
devops_bms
Most Recent 9 months ago
Selected Answer: C
using private endpoints shorten the network path between the model and the client app
upvoted 1 times
...
uatud3
11 months, 3 weeks ago
Selected Answer: D
I picked D. Sounds like the most logical answer
upvoted 1 times
...
wences
1 year, 1 month ago
Selected Answer: D
I don't have any link to support this other than a simple analysis; if you want the data or process to be low latency, you need to deploy closes where it is required, in this case, to Singapore customers, which reduces latetency addressing the requirement.
upvoted 1 times
...
inc_dev_ml_001
1 year, 5 months ago
Selected Answer: D
I think it's D because C and D should work in the same way, but ensuring the connection through a private endpoint it's not necessary because in the question there's nothing about security or sensitive informations. So the scope for a generic endpoint is "Accessible from anywhere", the scope for a private endpoint is "Accessible only within VPC or private connections". Don't see why to do that, it's only a matter of latency, not a matter of safety.
upvoted 2 times
f084277
12 months ago
"You work for a bank".... it's C
upvoted 2 times
...
...
GuineaPigHunter
1 year, 5 months ago
Selected Answer: D
Not sure why I'd choose C over D, my choice is D. Model is already deployed to us-central1 so now it's only a matter of deploying it to asia-southeast1 and letting the app choose the closer endpoint. Why the need for private endpoints and what will happen with the current already deployed model in us-central1?
upvoted 2 times
...
omermahgoub
1 year, 7 months ago
Selected Answer: C
Deploying the model to a Vertex AI private endpoint in the Singapore region brings the model closer to users in that region. This significantly reduces network latency for those users compared to accessing the model hosted in us-central1. Allowing the application to choose the appropriate endpoint based on user location (through private endpoints) ensures users access the geographically closest model replica, optimizing latency. Why not D: creating a separate endpoint in Singapore would allow regional deployment, it wouldn't automatically route users to the closest endpoint. You still need additional logic within the application for regional routing, increasing complexity.
upvoted 2 times
...
tavva_prudhvi
1 year, 7 months ago
Selected Answer: D
By having an endpoint in the asia-southeast1 region (Singapore), the data doesn't have to travel as far, significantly reducing the round-trip time. Allowing the application to choose the appropriate endpoint based on the user's location ensures that requests are handled by the nearest available server, optimizing response times for users in different regions.
upvoted 2 times
...
shuvs
1 year, 7 months ago
Selected Answer: D
I think it is D. C is questionable as why do you need a private endpoint?
upvoted 1 times
pinimichele01
1 year, 7 months ago
see guilhermebutzke
upvoted 1 times
...
AzureDP900
1 year, 4 months ago
Yes, using private endpoints does introduce some overhead. Additional latency: Establishing a connection to a private endpoint may add some latency compared to using the public endpoint. Increased complexity: Managing private endpoints requires additional configuration and management, which can increase the overall complexity of your deployment. However, in this scenario, the benefits of using private endpoints (security, control, and isolation) outweigh the potential overhead. The goal is to reduce latency for users in Singapore, and by deploying a private endpoint closer to them, you can achieve this while maintaining security and control over access to your model.
upvoted 1 times
AzureDP900
1 year, 4 months ago
I will go with C. In this scenario, deploying the model to Vertex AI private endpoints in both us-central1 and asia-southeast1 regions is necessary because: The application is hosted in Google Cloud and serves customers through APIs. By using private endpoints, you can create a secure connection between your application and the Vertex AI endpoint without exposing the model or data to the public internet. This ensures that sensitive information remains within the cloud. Private endpoints provide an IP address that is unique to your project, making it easier to manage access control and network policies. Without private endpoints, you would need to expose your model or data to the public internet, which increases the risk of unauthorized access and security breaches. Private endpoints provide a secure and controlled environment for hosting your model, ensuring that only authorized users can access it.
upvoted 1 times
...
...
...

Topic 1 Question 278

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 278 discussion

You need to train an XGBoost model on a small dataset. Your training code requires custom dependencies. You want to minimize the startup time of your training job. How should you set up your Vertex AI custom training job?

  • A. Store the data in a Cloud Storage bucket, and create a custom container with your training application. In your training application, read the data from Cloud Storage and train the model.
  • B. Use the XGBoost prebuilt custom container. Create a Python source distribution that includes the data and installs the dependencies at runtime. In your training application, load the data into a pandas DataFrame and train the model.
  • C. Create a custom container that includes the data. In your training application, load the data into a pandas DataFrame and train the model.
  • D. Store the data in a Cloud Storage bucket, and use the XGBoost prebuilt custom container to run your training application. Create a Python source distribution that installs the dependencies at runtime. In your training application, read the data from Cloud Storage and train the model.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
omermahgoub
Highly Voted 1 year, 7 months ago
Selected Answer: A
Given the focus on minimizing startup time, and based on the information about XGBoost prebuilt container dependencies available here https://cloud.google.com/vertex-ai/docs/training/pre-built-containers#xgboost A: Separate Data and Custom Container is the best approach for minimizing startup time, especially for small datasets. Separating data in Cloud Storage keeps the container image lean, leading to faster download and startup compared to bundling data within the container. B. The prebuilt Container could have unnecessary components, potentially increasing the image size and impacting startup time.
upvoted 5 times
...
guilhermebutzke
Highly Voted 1 year, 8 months ago
Selected Answer: A
My Answer: A Focus on “training code requires custom dependencies” and “ minimize the startup time of your training job”, the best choice is A because use custom container and read the data from GCS is he faster way
upvoted 5 times
...
Fer660
Most Recent 2 months, 1 week ago
Selected Answer: A
Baking the data into the container (C)? Whether the dataset is small or not -- this just sounds like a pretty bad practice.
upvoted 1 times
...
Foxy2021
1 year ago
I select D: While A could work, D is the optimal solution because it balances efficiency, ease of setup, and performance. It minimizes startup time by leveraging Google’s prebuilt XGBoost container and offers flexibility by installing custom dependencies at runtime. This approach avoids the overhead of building and maintaining a custom container from scratch, which is unnecessary for a small dataset with only specific custom dependency needs.
upvoted 2 times
...
wences
1 year, 1 month ago
Selected Answer: A
The fastest way is to have most of the things already installed, so that is why option A fits the best
upvoted 1 times
...
omribt
1 year, 4 months ago
Selected Answer: C
The focus is on startup time, and the dataset is small, so the container should still be of reasonable size. Downloading data from Cloud Storage introduces a delay.
upvoted 4 times
...
bobjr
1 year, 5 months ago
Selected Answer: C
The dataset is small, xgboost is implemented in python... (correcting my error A answer)
upvoted 1 times
...
bobjr
1 year, 5 months ago
Selected Answer: A
The dataset is small, xgboost is implemented in python...
upvoted 1 times
...
CHARLIE2108
1 year, 7 months ago
Why not C?
upvoted 1 times
tavva_prudhvi
1 year, 7 months ago
Because, Including the data in the container image is not recommended as it increases the image size and makes it less reusable.
upvoted 3 times
raidenrock
1 year, 6 months ago
But the description mentioned it is a small dataset and requires minimizing latency which makes C the best per requirement, there is no mentioning to make the container reusable whatsoever
upvoted 1 times
...
...
...
Yan_X
1 year, 8 months ago
Selected Answer: B
B XGBoost prebuilt customer container already includes XGBoost library and all of its dependencies. Python source distribution to avoid overhead of reading the data from Cloud storage the 2nd time. Load data to a Pandas DataFrame is convenient to work with Python. Pandas is for data analysis and manipulation.
upvoted 3 times
tavva_prudhvi
1 year, 7 months ago
However, the question specifically says that the training code requires custom dependencies beyond those included in the prebuilt container. Therefore, using the prebuilt container alone would not be sufficient in this case. & regarding the use of a Python source distribution to avoid reading data from Cloud Storage multiple times, it's important to consider the trade-off between startup time and potential performance gains. While including the data in the source distribution might save some time during training, it also increases the size of the container and can lead to longer startup times. For small datasets, the overhead of reading data from Cloud Storage is typically negligible compared to the benefits of a smaller container and faster startup.
upvoted 2 times
tavva_prudhvi
1 year, 7 months ago
Also, creating a Python source distribution that includes the data and installs the dependencies at runtime can increase startup time since dependencies have to be installed every time the job runs
upvoted 1 times
...
...
...

Topic 1 Question 279

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 279 discussion

You are creating an ML pipeline for data processing, model training, and model deployment that uses different Google Cloud services. You have developed code for each individual task, and you expect a high frequency of new files. You now need to create an orchestration layer on top of these tasks. You only want this orchestration pipeline to run if new files are present in your dataset in a Cloud Storage bucket. You also want to minimize the compute node costs. What should you do?

  • A. Create a pipeline in Vertex AI Pipelines. Configure the first step to compare the contents of the bucket to the last time the pipeline was run. Use the scheduler API to run the pipeline periodically.
  • B. Create a Cloud Function that uses a Cloud Storage trigger and deploys a Cloud Composer directed acyclic graph (DAG).
  • C. Create a pipeline in Vertex AI Pipelines. Create a Cloud Function that uses a Cloud Storage trigger and deploys the pipeline.
  • D. Deploy a Cloud Composer directed acyclic graph (DAG) with a GCSObjectUpdateSensor class that detects when a new file is added to the Cloud Storage bucket.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
fitri001
Highly Voted 1 year, 5 months ago
Selected Answer: C
Option C appears to be the best choice for balancing the requirements of efficient orchestration, cost minimization, and ensuring the pipeline only runs when new files are present. By using a Cloud Function triggered by Cloud Storage events to deploy a Vertex AI Pipeline, you can leverage the event-driven model of Cloud Functions to minimize unnecessary runs and associated costs, while still using the powerful orchestration capabilities of Vertex AI Pipelines.
upvoted 6 times
fitri001
1 year, 5 months ago
why not D? Pros: Cloud Composer provides a powerful orchestration framework that can handle complex dependencies and workflows.GCSObjectUpdateSensor can efficiently detect new files in the bucket and trigger the pipeline. Cons: Cloud Composer can be relatively costly due to the continuous operation of its environment. Overhead of maintaining Cloud Composer for potentially simple file-triggered tasks.
upvoted 1 times
tardigradum
1 year, 3 months ago
I think we should use Cloud Composer here because of "that uses different Google Cloud services". Vertex AI is less integrated with the rest of services than Cloud Composer, which was designed exactly for that.
upvoted 1 times
...
...
...
OpenKnowledge
Most Recent 4 weeks, 1 day ago
Selected Answer: C
Vertex AI Pipeline is serverless orchestration mechanism. Whereas, cloud composer (built-on open source Apache AirFlow) requires infrastructure for core components like scheduler, workers and web server. So, cloud composer incurs more compute costs. Cloud storage trigger is a mechanism that automatically executes a function or service in response to an event, such as uploading, deleting, or updating a file in a cloud storage bucket.
upvoted 1 times
...
juliorevk
11 months, 4 weeks ago
Probably C because while D would be good, it specifically says to minimize compute costs which cloud composer does incur whereas C is more serverless.
upvoted 3 times
...
Foxy2021
1 year ago
My answer is D: While C (Cloud Function + Vertex AI Pipelines) is a viable approach for triggering ML pipelines, D (Cloud Composer DAG with GCSObjectUpdateSensor) is the more appropriate and scalable solution when your orchestration spans multiple Google Cloud services and you want to minimize costs by only triggering the pipeline when new files appear.
upvoted 1 times
...
tardigradum
1 year, 3 months ago
Selected Answer: D
The key here is "that uses different Google Cloud services". Taking this into account, Cloud Composer is the correct answer (for instance, Vertex AI pipelines is not integrated with classic Dataproc or Cloud Composer DAGs). Moreover, GCSObjectUpdateSensor is more efficient than a Cloud Function.
upvoted 1 times
...
Kili1
1 year, 5 months ago
Selected Answer: D
"Different Google Cloud services" and GCSObjectUpdateSensor: This sensor class specifically checks for updates to Cloud Storage objects. This ensures the DAG only triggers when there's a new file in the bucket, minimizing unnecessary executions.
upvoted 1 times
...
CHARLIE2108
1 year, 7 months ago
Why not D?
upvoted 1 times
...
Yan_X
1 year, 8 months ago
Selected Answer: C
C Cloud Function to be triggered by Cloud storage trigger, and then deploy the Vertex AI pipeline.
upvoted 3 times
...
JG123
1 year, 8 months ago
Its C. Vertex pipelines are recommened to run ML pipeline!
upvoted 1 times
...
guilhermebutzke
1 year, 8 months ago
Selected Answer: B
My Answer: B Cloud Function that uses a Cloud Storage trigger (”run if new files are present in your dataset in a Cloud Storage bucket”) and Cloud Composer directed acyclic graph (DAG) (”model deployment that uses different Google Cloud services”, ”orchestration layer on top of these tasks”,)
upvoted 1 times
tavva_prudhvi
1 year, 8 months ago
Cloud Composer already provides a way to orchestrate tasks, and creating a Cloud Function to deploy a DAG is not a common practice. The Cloud Function with a Cloud Storage trigger would be redundant since the GCSObjectUpdateSensor within the DAG itself can handle the file detection.
upvoted 5 times
...
...

Topic 1 Question 280

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 280 discussion

You are using Kubeflow Pipelines to develop an end-to-end PyTorch-based MLOps pipeline. The pipeline reads data from BigQuery, processes the data, conducts feature engineering, model training, model evaluation, and deploys the model as a binary file to Cloud Storage. You are writing code for several different versions of the feature engineering and model training steps, and running each new version in Vertex AI Pipelines. Each pipeline run is taking over an hour to complete. You want to speed up the pipeline execution to reduce your development time, and you want to avoid additional costs. What should you do?

  • A. Comment out the part of the pipeline that you are not currently updating.
  • B. Enable caching in all the steps of the Kubeflow pipeline.
  • C. Delegate feature engineering to BigQuery and remove it from the pipeline.
  • D. Add a GPU to the model training step.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Yan_X
Highly Voted 1 year, 2 months ago
Selected Answer: B
B 'Different version of feature engineering and model training', so enable cache can help to reuse results of previous run. Guess not be C, as it mentioned 'end-to-end' MLOps, if delegate to BigQuery, it is not 'end-to-end' now.
upvoted 7 times
...
OpenKnowledge
Most Recent 4 weeks, 1 day ago
Selected Answer: B
This problem is more like indicating to Integrating a Kubeflow Pipeline with Vertex AI through leveraging Vertex AI Pipelines as a serverless execution environment for your Kubeflow Pipelines (KFP) SDK-defined workflows.
upvoted 1 times
...
omermahgoub
1 year ago
B, and here's why: 1. Caching directly addresses the issue of redundant computations, especially for frequently used feature engineering versions 2. End-to-End" MLOps, Kubeflow Pipelines handle all stages, including feature engineering, maintaining your desired "end-to-end" workflow.
upvoted 1 times
...
JG123
1 year, 2 months ago
Answer is C
upvoted 1 times
...

Topic 1 Question 281

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 281 discussion

You work at a large organization that recently decided to move their ML and data workloads to Google Cloud. The data engineering team has exported the structured data to a Cloud Storage bucket in Avro format. You need to propose a workflow that performs analytics, creates features, and hosts the features that your ML models use for online prediction. How should you configure the pipeline?

  • A. Ingest the Avro files into Cloud Spanner to perform analytics. Use a Dataflow pipeline to create the features, and store them in Vertex AI Feature Store for online prediction.
  • B. Ingest the Avro files into BigQuery to perform analytics. Use a Dataflow pipeline to create the features, and store them in Vertex AI Feature Store for online prediction.
  • C. Ingest the Avro files into Cloud Spanner to perform analytics. Use a Dataflow pipeline to create the features, and store them in BigQuery for online prediction.
  • D. Ingest the Avro files into BigQuery to perform analytics. Use BigQuery SQL to create features and store them in a separate BigQuery table for online prediction.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
guilhermebutzke
Highly Voted 1 year, 8 months ago
Selected Answer: B
My Answer: B “You need to propose a workflow that performs analytics, creates features, and hosts ”:  Ingest the Avro files into BigQuery to perform analytics “workflow that performs analytics, creates features”: Dataflow pipeline to create the features “and hosts the features that your ML models use for online prediction”:store them in Vertex AI Feature Store for online prediction
upvoted 9 times
...
carolctech
Most Recent 1 year ago
Selected Answer: B
B) BigQuery is designed for large-scale analytics, while Spanner (options A and C) is not, since it is more suited for transactional workloads. The Dataflow pipeline should be used to transform the Avro files into Parquet before ingesting it into BigQuery and is also optimal for feature engineering tasks. Vertex AI Feature Store is specifically designed for online feature management and serving, while storing features in BigQuery is not the best option for online prediction, due to potential latency.
upvoted 1 times
...
AzureDP900
1 year, 4 months ago
B is right The original audio recordings have an 8 kHz sample rate, which is sufficient for speech recognition. Using the Speech-to-Text API with synchronous recognition would require your application to wait for the transcription process to complete before proceeding. This could lead to performance issues and delays in processing large volumes of audio data. Asynchronous recognition, on the other hand, allows your application to continue processing without waiting for the transcription process to complete. The transcribed text can be retrieved later when needed.
upvoted 1 times
...
VinaoSilva
1 year, 4 months ago
Selected Answer: B
"performs analytics" = Bigquery "hosts the features" = Vertex AI Feature Store"
upvoted 1 times
...
emsherff
1 year, 7 months ago
Selected Answer: B
Vertex AI Feature Store is designed for managing and serving features for online prediction with low latency.
upvoted 2 times
...
MultiCloudIronMan
1 year, 7 months ago
Selected Answer: A
I think the answer is A because BigQuery does not support Avro format but CloudSpanner does.
upvoted 1 times
b2aaace
1 year, 7 months ago
FYI BigQuery supports the Avro format. Please check your facts
upvoted 4 times
...
...

Topic 1 Question 282

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 282 discussion

You work at an organization that maintains a cloud-based communication platform that integrates conventional chat, voice, and video conferencing into one platform. The audio recordings are stored in Cloud Storage. All recordings have an 8 kHz sample rate and are more than one minute long. You need to implement a new feature in the platform that will automatically transcribe voice call recordings into a text for future applications, such as call summarization and sentiment analysis. How should you implement the voice call transcription feature following Google-recommended best practices?

  • A. Use the original audio sampling rate, and transcribe the audio by using the Speech-to-Text API with synchronous recognition.
  • B. Use the original audio sampling rate, and transcribe the audio by using the Speech-to-Text API with asynchronous recognition.
  • C. Upsample the audio recordings to 16 kHz, and transcribe the audio by using the Speech-to-Text API with synchronous recognition.
  • D. Upsample the audio recordings to 16 kHz, and transcribe the audio by using the Speech-to-Text API with asynchronous recognition.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
CHARLIE2108
Highly Voted 1 year, 7 months ago
Selected Answer: D
I went with D. "following Google-recommended best practices" https://cloud.google.com/speech-to-text/docs/optimizing-audio-files-for-speech-to-text#:~:text=We%20recommend%20a%20sample%20rate%20of%20at%20least%2016%20kHz%20in%20the%20audio%20files%20that%20you%20use%20for%20transcription%20with%20Speech%2Dto%2DText
upvoted 10 times
...
asmgi
Highly Voted 1 year, 3 months ago
Selected Answer: B
We have longer than minute, 8KHz recordings. https://cloud.google.com/speech-to-text/docs/best-practices-provide-speech-data "avoid re-sampling. For example, in telephony the native rate is commonly 8000 Hz, which is the rate that should be sent to the service." -> 8KHz https://cloud.google.com/speech-to-text/docs/sync-recognize "Synchronous speech recognition returns the recognized text for short audio (less than 60 seconds). To process a speech recognition request for audio longer than 60 seconds, use Asynchronous Speech Recognition." -> asynchronous So, the correct answer is B.
upvoted 7 times
...
el_vampiro
Most Recent 2 months ago
Selected Answer: B
No need to upsample per documentation
upvoted 1 times
...
qaz09
4 months ago
Selected Answer: B
B for sure not A and C -> we can not use synchronous recognition (recordings are more than 1 min) https://cloud.google.com/speech-to-text/docs/speech-to-text-requests#speech_requests not D -> do not resample your audio data, it will impair accuracy https://cloud.google.com/speech-to-text/docs/speech-to-text-requests#sample-rates
upvoted 1 times
...
Pau1234
11 months ago
Selected Answer: B
According to the documentation: If possible, set the sampling rate of the audio source to 16000 Hz. Otherwise, set the sample_rate_hertz to match the native sample rate of the audio source (instead of re-sampling). https://cloud.google.com/speech-to-text/docs/best-practices-provide-speech-data
upvoted 2 times
...
Omi_04040
11 months ago
Selected Answer: B
Lower sampling rates may reduce accuracy. However, avoid re-sampling. For example, in telephony the native rate is commonly 8000 Hz, which is the rate that should be sent to the service. https://cloud.google.com/speech-to-text/docs/best-practices-provide-speech-data
upvoted 1 times
...
AB_C
11 months, 2 weeks ago
Selected Answer: D
While you can use the original 8 kHz sample rate, upsampling to 16 kHz is likely to improve transcription accuracy.
upvoted 1 times
...
carolctech
1 year ago
Selected Answer: D
The correct answer is D because the Google Cloud Speech-to-Text API recommends a sample rate of 16 kHz for optimal performance. While it can handle 8 kHz, the accuracy will be significantly lower. Synchronous recognition means the API waits for the entire audio file to be processed before returning a result. This is fine for short audio clips, but for recordings longer than a minute (as specified), it's highly inefficient and could lead to timeouts or delays in the application. Asynchronous recognition allows the API to process the audio in the background, returning a notification when the transcription is complete. This is much better suited for longer audio files and doesn't block the application.
upvoted 1 times
...
wences
1 year, 1 month ago
Selected Answer: B
Agree on B. If you read carefuly the documentation pointed will come to the conclusion that there is no need to upsample voice
upvoted 4 times
...
PhilipKoku
1 year, 5 months ago
Selected Answer: B
B) Use original sampling rate and use asynchronous recognition... "If possible, set the sampling rate of the audio source to 16000 Hz. Otherwise, set the sample_rate_hertz to match the native sample rate of the audio source (instead of re-sampling)." https://cloud.google.com/speech-to-text/docs/best-practices-provide-speech-data#sampling_rate
upvoted 5 times
...
livewalk
1 year, 5 months ago
Selected Answer: B
According to google recommandation on Sampling rate: "If possible, set the sampling rate of the audio source to 16000 Hz. Otherwise, set the sample_rate_hertz to match the native sample rate of the audio source (instead of re-sampling)." So we should match the native sample (8kHz) in the question.
upvoted 4 times
...
pinimichele01
1 year, 6 months ago
Selected Answer: B
https://cloud.google.com/speech-to-text/docs/best-practices-provide-speech-data: Capture audio with a sampling rate of 16,000 Hz or higher. Lower sampling rates may reduce accuracy. However, avoid re-sampling. For example, in telephony the native rate is commonly 8000 Hz, which is the rate that should be sent to the service. https://cloud.google.com/speech-to-text/docs/optimizing-audio-files-for-speech-to-text#sample_rate_frequency_range: It's possible to convert from one sample rate to another. However, there's no benefit to up-sampling the audio, because the frequency range information is limited by the lower sample rate and can't be recovered by converting to a higher sample rate. -----> B, not D
upvoted 3 times
...
SahandJ
1 year, 6 months ago
Selected Answer: B
According to the documentation, it's best to have 16 KHz sample rate, however one should avoid up-sampling and rather use the native sample rate
upvoted 3 times
...
ludovikush
1 year, 7 months ago
Selected Answer: B
Following best practices, the easiest choice is B
upvoted 3 times
...
omermahgoub
1 year, 7 months ago
Selected Answer: D
Upsample to 16 kHz and Use Asynchronous Speech-to-Text Recognition
upvoted 1 times
...
tavva_prudhvi
1 year, 7 months ago
Selected Answer: D
Upsampling to 16 kHz: The Speech-to-Text API recommends an audio sample rate of 16 kHz for optimal transcription accuracy. Upsampling the 8 kHz recordings to 16 kHz will improve the quality of the transcription. Asynchronous Recognition: Asynchronous recognition is suitable for longer audio recordings (more than one minute). It allows you to submit the audio file and receive the transcription results later, which is more efficient for batch processing. https://cloud.google.com/speech-to-text/docs/best-practices-provide-speech-data
upvoted 4 times
...
guilhermebutzke
1 year, 8 months ago
Selected Answer: B
My Answer: B - Not necessary upsampling (exclude C and D) - Asynchronous means executing different tasks with no sequential order. Therefore, is preferred over synchronous recognition for longer audio recordings as it allows for more efficient processing, especially when dealing with larger volumes of data.
upvoted 3 times
...

Topic 1 Question 283

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 283 discussion

You work for a multinational organization that has recently begun operations in Spain. Teams within your organization will need to work with various Spanish documents, such as business, legal, and financial documents. You want to use machine learning to help your organization get accurate translations quickly and with the least effort. Your organization does not require domain-specific terms or jargon. What should you do?

  • A. Create a Vertex AI Workbench notebook instance. In the notebook, extract sentences from the documents, and train a custom AutoML text model.
  • B. Use Google Translate to translate 1,000 phrases from Spanish to English. Using these translated pairs, train a custom AutoML Translation model.
  • C. Use the Document Translation feature of the Cloud Translation API to translate the documents.
  • D. Create a Vertex AI Workbench notebook instance. In the notebook, convert the Spanish documents into plain text, and create a custom TensorFlow seq2seq translation model.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
guilhermebutzke
Highly Voted 1 year, 8 months ago
Selected Answer: C
My Answer: C This option provides a straightforward solution for translating various types of documents (business, legal, financial) quickly and with minimal effort. It leverages Google's Cloud Translation API, which is designed specifically for tasks like this and eliminates the need for manual training or customization. https://cloud.google.com/translate/docs
upvoted 6 times
...
carolctech
Most Recent 1 year ago
Selected Answer: C
The Document Translation feature of the Cloud Translation API is the quicker solution, and it will lead to the least effort as required in the statement. Since your organization does not require domain-specific terms or jargon, no custom solution is needed in this case, which confirms C as the best option.
upvoted 2 times
...
VinaoSilva
1 year, 4 months ago
Selected Answer: C
"translations quickly and with the least effort" = Cloud Translation API
upvoted 2 times
...
fitri001
1 year, 6 months ago
Selected Answer: C
Cloud Translation API - Document Translation: This pre-built service is specifically designed for translating large volumes of documents while preserving the document structure and formatting. It supports various languages, including Spanish, and offers high accuracy for general-purpose translations without domain-specific requirements. Least Effort: Cloud Translation API requires minimal setup. You can directly submit your Spanish documents to the API and receive translated versions in English. There's no need for custom model training or data preparation.
upvoted 2 times
...
omermahgoub
1 year, 7 months ago
Selected Answer: C
Leverage Document Translation in Cloud Translation API
upvoted 1 times
...

Topic 1 Question 284

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 284 discussion

You have a custom job that runs on Vertex AI on a weekly basis. The job is implemented using a proprietary ML workflow that produces the datasets, models, and custom artifacts, and sends them to a Cloud Storage bucket. Many different versions of the datasets and models were created. Due to compliance requirements, your company needs to track which model was used for making a particular prediction, and needs access to the artifacts for each model. How should you configure your workflows to meet these requirements?

  • A. Use the Vertex AI Metadata API inside the custom job to create context, execution, and artifacts for each model, and use events to link them together.
  • B. Create a Vertex AI experiment, and enable autologging inside the custom job.
  • C. Configure a TensorFlow Extended (TFX) ML Metadata database, and use the ML Metadata API.
  • D. Register each model in Vertex AI Model Registry, and use model labels to store the related dataset and model information.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
guilhermebutzke
Highly Voted 1 year, 2 months ago
Selected Answer: A
My Answer: A Focus on “Due to compliance requirements, your company needs to track which model was used for making a particular prediction” and “workflow that produces the datasets, models, and custom artifacts, and sends them to a Cloud Storage bucket”, use Vertex AI Metadata API is the best approach.
upvoted 5 times
pinimichele01
1 year ago
where you find the question? do you pass the exam?
upvoted 1 times
...
...
omermahgoub
Highly Voted 1 year ago
Selected Answer: A
Track Lineage with Vertex AI Metadata API
upvoted 5 times
...
emsherff
Most Recent 1 year, 1 month ago
Selected Answer: A
A - Vertex AI Metadata API provides low-level primitives for creating custom metadata entities and relationships (contexts, executions, artifacts, and events). B - Autologging might not capture all the custom artifacts your job produces.
upvoted 2 times
...

Topic 1 Question 285

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 285 discussion

You have recently developed a custom model for image classification by using a neural network. You need to automatically identify the values for learning rate, number of layers, and kernel size. To do this, you plan to run multiple jobs in parallel to identify the parameters that optimize performance. You want to minimize custom code development and infrastructure management. What should you do?

  • A. Train an AutoML image classification model.
  • B. Create a custom training job that uses the Vertex AI Vizier SDK for parameter optimization.
  • C. Create a Vertex AI hyperparameter tuning job.
  • D. Create a Vertex AI pipeline that runs different model training jobs in parallel.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
guilhermebutzke
Highly Voted 1 year, 8 months ago
Selected Answer: C
My Answer: C Vertex AI provides a service for hyperparameter tuning which allows you to specify the hyperparameters you want to optimize, such as learning rate, number of layers, and kernel size, and then it automatically runs multiple training jobs with different combinations of these hyperparameters to find the configuration that maximizes performance.
upvoted 9 times
...
VinaoSilva
Most Recent 1 year, 4 months ago
Selected Answer: C
https://cloud.google.com/vertex-ai/docs/training/using-hyperparameter-tuning
upvoted 3 times
...
omermahgoub
1 year, 7 months ago
Selected Answer: C
Leverage Vertex AI Hyperparameter Tuning
upvoted 2 times
pinimichele01
1 year, 6 months ago
why not b?
upvoted 1 times
YushiSato
1 year, 3 months ago
Additional code is required to use the Vertex AI Vizier SDK.
upvoted 1 times
...
...
...
alfieroy16
1 year, 7 months ago
Selected Answer: C
True that
upvoted 1 times
...

Topic 1 Question 286

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 286 discussion

You work for a company that builds bridges for cities around the world. To track the progress of projects at the construction sites, your company has set up cameras at each location. Each hour, the cameras take a picture that is sent to a Cloud Storage bucket. A team of specialists reviews the images, filters important ones, and then annotates specific objects in them. You want to propose using an ML solution that will help the company scale and reduce costs. You need the solution to have minimal up-front cost. What method should you propose?

  • A. Train an AutoML object detection model to annotate the objects in the images to help specialists with the annotation task.
  • B. Use the Cloud Vision API to automatically annotate objects in the images to help specialists with the annotation task.
  • C. Create a BigQuery ML classification model to classify important images. Use the model to predict which new images are important to help specialists with the filtering task.
  • D. Use Vertex AI to train an open source object detection to annotate the objects in the images to help specialists with the annotation task.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Fer660
2 months, 1 week ago
Selected Answer: B
I support B Not A: training AutoML jobs is not cheap, as I recently and painfully found out! We are clearly looking for a low upfront cost. B will do nicely! The objects you will find in a construction site are plain-vanilla and Vision API should handle them well. The remainder can be added by the human annotators. Not C: BigQuery is not going to help with image tasks, think tabular data for BQ Not D: Need to train, this will incur upfront costs.
upvoted 2 times
...
strafer
9 months, 2 weeks ago
Selected Answer: A
Cloud Vision API: Pay-as-you-go and Ready-to-use: The Cloud Vision API offers pre-trained models for object detection (and many other image analysis tasks). It's a pay-as-you-go service, meaning you only pay for the API calls you make. This translates to minimal up-front cost since there's no model training or infrastructure setup required on your end. You can immediately start using the API with your existing image data.
upvoted 1 times
el_vampiro
2 months ago
Means B, not A
upvoted 1 times
...
...
thescientist
10 months, 2 weeks ago
Selected Answer: B
AutoML requires training data and incurs training costs - for no upfront cost: B
upvoted 3 times
...
vladik820
10 months, 4 weeks ago
Selected Answer: B
Cloud Vision API - Pay-per-use based on the number of images processed. No training required – it's a pre-trained API.
upvoted 2 times
...
Omi_04040
11 months ago
Selected Answer: A
Since we have corpus of images and custom lables, 'Cloud Vision API' wont help, also its not advisable to use BigQuery ML classification for Image data Hence ans is A
upvoted 2 times
...
AB_C
11 months, 2 weeks ago
Selected Answer: A
While the Vision API can detect objects, it might not be as accurate or specific as a custom-trained model for this particular use case (bridge construction).
upvoted 2 times
...

Topic 1 Question 287

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 287 discussion

You are tasked with building an MLOps pipeline to retrain tree-based models in production. The pipeline will include components related to data ingestion, data processing, model training, model evaluation, and model deployment. Your organization primarily uses PySpark-based workloads for data preprocessing. You want to minimize infrastructure management effort. How should you set up the pipeline?

  • A. Set up a TensorFlow Extended (TFX) pipeline on Vertex AI Pipelines to orchestrate the MLOps pipeline. Write a custom component for the PySpark-based workloads on Dataproc.
  • B. Set up a Vertex AI Pipelines to orchestrate the MLOps pipeline. Use the predefined Dataproc component for the PySpark-based workloads.
  • C. Set up Kubeflow Pipelines on Google Kubernetes Engine to orchestrate the MLOps pipeline. Write a custom component for the PySparkbased workloads on Dataproc.
  • D. Set up Cloud Composer to orchestrate the MLOps pipeline. Use Dataproc workflow templates for the PySpark-based workloads in Cloud Composer.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Pau1234
11 months ago
Selected Answer: B
minimize infrastructure management effort -- hence B
upvoted 1 times
...
Omi_04040
11 months ago
Selected Answer: B
A- Rejected due to component for the PySpark-based C- Kubeflow Pipelines not a managed service and the question mentions 'minimize infrastructure management effort' D-
upvoted 1 times
Omi_04040
11 months ago
D- Cloud Composer to orchestrate is an overhead hence B
upvoted 1 times
...
...
AB_C
11 months, 2 weeks ago
Selected Answer: B
This is the most suitable approach
upvoted 2 times
...
carolctech
1 year ago
Selected Answer: B
B) Best option due to higher ease of use, integration with existing PySpark infrastructure (via Dataproc) and minimal infrastructure management overhead, because: Vertex AI Pipelines is fully managed, minimizing infra management effort and natively integrated with Dataproc for PySpark (while Composer is not); Dataproc’s predefined component for PySpark workload reduces effort and error probability; It is suitable for tree-based models (other options are too, but with more effort)
upvoted 2 times
...

Topic 1 Question 288

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 288 discussion

You have developed an AutoML tabular classification model that identifies high-value customers who interact with your organization's website. You plan to deploy the model to a new Vertex AI endpoint that will integrate with your website application. You expect higher traffic to the website during nights and weekends. You need to configure the model endpoint's deployment settings to minimize latency and cost. What should you do?

  • A. Configure the model deployment settings to use an n1-standard-32 machine type.
  • B. Configure the model deployment settings to use an n1-standard-4 machine type. Set the minReplicaCount value to 1 and the maxReplicaCount value to 8.
  • C. Configure the model deployment settings to use an n1-standard-4 machine type and a GPU accelerator. Set the minReplicaCount value to 1 and the maxReplicaCount value to 4.
  • D. Configure the model deployment settings to use an n1-standard-8 machine type and a GPU accelerator.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
el_vampiro
2 months ago
Selected Answer: B
https://cloud.google.com/vertex-ai/docs/predictions/configure-compute#gpus
upvoted 1 times
...
AB_C
11 months, 2 weeks ago
Selected Answer: B
A (n1-standard-32): This is a much larger machine type and will likely be more expensive than necessary for your model. It could lead to unnecessary costs, especially during periods of low traffic. C and D (GPU Accelerators): While GPUs can be beneficial for some models, they are generally not required for tabular models. Adding a GPU would increase the cost without providing significant performance gains.
upvoted 3 times
...
carolctech
1 year ago
Selected Answer: B
B) This option provides the most cost-effective and efficient solution because: 1) Uses a suitably powerful machine type (n1-standard-4 machine) 2) Autoscales with minReplicaCount and maxReplicaCount to adapt to the fluctuating traffic 3) A larger machine type or accelerator is unnecessary. GPU provide better performance for DL models with massive datasets and complex architectures, not for tabular classification models.
upvoted 2 times
...

Topic 1 Question 289

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 289 discussion

You developed a BigQuery ML linear regressor model by using a training dataset stored in a BigQuery table. New data is added to the table every minute. You are using Cloud Scheduler and Vertex AI Pipelines to automate hourly model training, and use the model for direct inference. The feature preprocessing logic includes quantile bucketization and MinMax scaling on data received in the last hour. You want to minimize storage and computational overhead. What should you do?

  • A. Preprocess and stage the data in BigQuery prior to feeding it to the model during training and inference.
  • B. Use the TRANSFORM clause in the CREATE MODEL statement in the SQL query to calculate the required statistics.
  • C. Create a component in the Vertex AI Pipelines directed acyclic graph (DAG) to calculate the required statistics, and pass the statistics on to subsequent components.
  • D. Create SQL queries to calculate and store the required statistics in separate BigQuery tables that are referenced in the CREATE MODEL statement.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Fer660
2 months, 1 week ago
Selected Answer: C
I disagree with B, because the minmax requires the context of the last 1-hour worth of entries. TRANSFORM can't do that. D would do the trick, except that we are asked to minimize additional storage. So we have to settle for C, although in practice the overhead of D would be so low as to be insignificant for BQ.
upvoted 1 times
...
Wuthuong1234
8 months, 2 weeks ago
Selected Answer: B
B is the right solution. Keep in mind that it is asking for a solution where you "minimize storage and computational overhead". You end up storing more data with A and D. While in C you create more computational overhead. All solutions would work perfectly fine, but B matches best with the requirements in the question.
upvoted 3 times
...
Ankit267
10 months, 2 weeks ago
Selected Answer: B
BQ is sufficient
upvoted 2 times
...
AB_C
11 months, 2 weeks ago
Selected Answer: A
While the TRANSFORM clause can perform preprocessing, it's applied during model creation, not for inference. You'll need to recalculate statistics for each inference request, increasing computational overhead.
upvoted 1 times
Omi_04040
11 months ago
This is wrong This tutorial introduces data analysts to BigQuery ML. BigQuery ML enables users to create and execute machine learning models in BigQuery using SQL queries. This tutorial introduces feature engineering by using the TRANSFORM clause. Using the TRANSFORM clause, you can specify all preprocessing during model creation. The preprocessing is automatically applied during the prediction and evaluation phases of machine learning. https://cloud.google.com/bigquery/docs/bigqueryml-transform
upvoted 2 times
...
...
shubhachandra
11 months, 2 weeks ago
Selected Answer: B
The TRANSFORM clause in BigQuery ML allows you to directly define feature preprocessing logic (such as quantile bucketization and MinMax scaling) within the SQL query itself. This approach minimizes storage and computational overhead because: No additional storage: Statistics for preprocessing are calculated on-the-fly during model training and inference, without needing to store preprocessed data or statistics separately. Integrated workflow: The preprocessing logic is tightly coupled with the model creation process, ensuring consistency between training and inference without external dependencies.
upvoted 4 times
...
lunalongo
11 months, 3 weeks ago
Selected Answer: B
B is the best option because: 1) TRANSFORM saves processing, storage and computation by performing feature preprocessing directly within the CREATE MODEL. 2) This method integrates preprocessing with model training, streamlining the entire process.
upvoted 2 times
...
f084277
12 months ago
Selected Answer: C
Docs say BQ is not suitable for full-pass transformations such as Minmax.
upvoted 2 times
...
carolctech
1 year ago
Selected Answer: A
A) Preprocessing and staging the data in BigQuery before training and inference, is the most efficient approach because: 1) You can use BQ’s optimized processing by preprocessing data before training 2) Avoiding redundant calculations, by directly using the preprocessed data (already bucketized and scaled) for training and inference; 3) Reducing storage by keeping only preprocessed data, not raw data and statistics separately.
upvoted 1 times
...

Topic 1 Question 290

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 290 discussion

You developed a Python module by using Keras to train a regression model. You developed two model architectures, linear regression and deep neural network (DNN), within the same module. You are using the training_method argument to select one of the two methods, and you are using the learning_rate and num_hidden_layers arguments in the DNN. You plan to use Vertex AI's hypertuning service with a budget to perform 100 trials. You want to identify the model architecture and hyperparameter values that minimize training loss and maximize model performance. What should you do?

  • A. Run one hypertuning job for 100 trials. Set num_hidden_layers as a conditional hyperparameter based on its parent hyperparameter training_method, and set learning_rate as a non-conditional hyperparameter.
  • B. Run two separate hypertuning jobs, a linear regression job for 50 trials, and a DNN job for 50 trials. Compare their final performance on a common validation set, and select the set of hyperparameters with the least training loss.
  • C. Run one hypertuning job with training_method as the hyperparameter for 50 trials. Select the architecture with the lowest training loss, and further hypertune it and its corresponding hyperparameters tor 50 trials.
  • D. Run one hypertuning job for 100 trials. Set num_hidden_layers and learning_rate as conditional hyperparameters based on their parent hyperparameter training_method.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
dija123
1 month ago
Selected Answer: D
I agree with D.
upvoted 1 times
...
Fer660
2 months, 1 week ago
Selected Answer: D
I don't think that "same module" autmatically means that both are being trained using gradient descent. Furthermore, the learning rate for the DNN would not be the same as the optimal learning rate for the regression with SDG. Therefore, both learning rate and num_layers would be decided on the second tier, conditional on the architecture.
upvoted 3 times
...
kaneup
7 months, 2 weeks ago
Selected Answer: D
this is D
upvoted 2 times
...
River3000
8 months ago
This should be D, as the question stated that 'linear regression and deep neural network (DNN), within the same module', This typically means that even the linear regression model is trained using gradient-based optimization (such as SGD or Adam), rather than using a closed-form solution. So, the phrase "within the same module" implies that the linear model also relies on gradient descent, and thus the learning_rate parameter is applicable for training both models—even though the DNN additionally uses the num_hidden_layers parameter for its architecture.
upvoted 1 times
...
Ankit267
10 months, 2 weeks ago
Selected Answer: A
A & D for obvious reasons. Why A ? DNN with 1 num_hidden_layer is equivalent to linear regression model therefore num_hidden_layer is conditional, though learning_rate can be hyperparametrized for both DNN( one hidden layer i.e. linear regression & >1 hidden layer i.e. DNN). Therefore A is the right answer
upvoted 1 times
...
Pau1234
11 months ago
Selected Answer: A
Agree with Omi_04040. num_hidden_layers is only relevant to the DNN model and not the linear regression model, according to the documentation
upvoted 3 times
...
Omi_04040
11 months ago
Selected Answer: A
Answer is A since 'learning rate' cannot be shared This question is a literal spinoff from this paragraph https://cloud.google.com/vertex-ai/docs/training/hyperparameter-tuning-overview#conditional_hyperparameters
upvoted 3 times
...
rajshiv
11 months, 1 week ago
Selected Answer: D
A is incorrect because both num_hidden_layers and learning_rate are hyperparameters specific to the DNN model. Since both hyperparameters need to be conditional on training_method being DNN, making only one of them conditional is not sufficient. The problem has two model architectures: linear regression and DNN. Depending on the model architecture, the hyperparameters change: 1) For DNN, the hyperparameters are num_hidden_layers and learning_rate while 2) For linear regression, these hyperparameters are not relevant. Hence I vote D.
upvoted 3 times
...
lunalongo
11 months, 3 weeks ago
A is the best option because: Running one single job with conditional logics added to hyperparameters settings avoids unnecessary computing usage and comparison efforts. Only num_hidden_layers needs to be set as a conditional hyperparameter under training_method; no explicit conditional logic is needed for learning_rate -- the latter is intelligently ignored by Vertex AI when linear regression is the training_method. B, C and D are less suitable because B and C run 2 separate jobs; D runs only one job, but it's hyperparameter tuning strategy adds redundant processing, even if it's true that the learning_rate is irrelevant for linear regression methods. The underlying logics behind it: As a STRUCTURAL hyperparameter, num_hidden_layers is intrinsically tied to the DNN's architecture definition. As a TRAINING hyperparameter, learning_rate is linked to the training process, not directly tied to the architecture definition.
upvoted 4 times
...
carolctech
1 year ago
Selected Answer: A
The best approach is A and here's why: The use of the 100 trials in a single job by using conditional hyperparameters maximizes budget efficiency. The number of hidden layers should be conditional, because it is relevant only for NON-LINEAR models like neural networks (which is DNN's case) and not for linear models -- where hidden layers don't exist. Learning rate is relevant for both models, unless the question stated that the regression model used a closed-form solution, not a gradient-based optimization method.
upvoted 4 times
...
JDpmle2024
1 year ago
Selected Answer: D
This would allow you to first set the type of job, and only after that any other parameters. So first, select training_method. If training_method is DNN, then you specify the other parameters.
upvoted 1 times
...

Topic 1 Question 291

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 291 discussion

You work for a hospital. You received approval to collect the necessary patient data, and you trained a Vertex AI tabular AutoML model that calculates patients' risk score for hospital admission. You deployed the model. However, you're concerned that patient demographics might change over time and alter the feature interactions and impact prediction accuracy. You want to be alerted if feature interactions change, and you want to understand the importance of the features for the predictions. You want your alerting approach to minimize cost. What should you do?

  • A. Create a feature drift monitoring job. Set the sampling rate to 1 and the monitoring frequency to weekly.
  • B. Create a feature drift monitoring job. Set the sampling rate to 0.1 and the monitoring frequency to weekly.
  • C. Create a feature attribution drift monitoring job. Set the sampling rate to 1 and the monitoring frequency to weekly.
  • D. Create a feature attribution drift monitoring job. Set the sampling rate to 0.1 and the monitoring frequency to weekly.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Omi_04040
11 months ago
Selected Answer: D
specifically concerned about changes in feature interactions and their impact on predictions. Feature attribution drift monitoring directly addresses this by tracking how the importance of different features (and their interactions) changes over time.   https://cloud.google.com/vertex-ai/docs/model-monitoring/monitor-explainable-ai
upvoted 3 times
...
e821027
11 months, 2 weeks ago
Selected Answer: D
But the interest is also to understand the importance of features for the predictions.
upvoted 2 times
...
AB_C
11 months, 2 weeks ago
Selected Answer: D
Why other options are less suitable: A and B (Feature Drift Monitoring): While basic feature drift monitoring can detect changes in feature distributions, it doesn't directly address your concern about changes in feature interactions and their impact on predictions. C (Sampling Rate of 1): Analyzing 100% of the prediction requests for feature attribution drift can be expensive, especially if you have high traffic.
upvoted 2 times
...
JDpmle2024
1 year ago
Selected Answer: B
This is feature drift (features are changing) and not feature attribution drift (features are having different effects on the prediction).
upvoted 1 times
...

Topic 1 Question 292

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 292 discussion

You are developing a TensorFlow Extended (TFX) pipeline with standard TFX components. The pipeline includes data preprocessing steps. After the pipeline is deployed to production, it will process up to 100 TB of data stored in BigQuery. You need the data preprocessing steps to scale efficiently, publish metrics and parameters to Vertex AI Experiments, and track artifacts by using Vertex ML Metadata. How should you configure the pipeline run?

  • A. Run the TFX pipeline in Vertex AI Pipelines. Configure the pipeline to use Vertex AI Training jobs with distributed processing.
  • B. Run the TFX pipeline in Vertex AI Pipelines. Set the appropriate Apache Beam parameters in the pipeline to run the data preprocessing steps in Dataflow.
  • C. Run the TFX pipeline in Dataproc by using the Apache Beam TFX orchestrator. Set the appropriate Vertex AI permissions in the job to publish metadata in Vertex AI.
  • D. Run the TFX pipeline in Dataflow by using the Apache Beam TFX orchestrator. Set the appropriate Vertex AI permissions in the job to publish metadata in Vertex AI.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
AB_C
Highly Voted 11 months, 2 weeks ago
Selected Answer: B
A (Vertex AI Training jobs): While Vertex AI Training jobs are useful for model training, they are not the primary way to scale data preprocessing within a TFX pipeline. C and D (Dataproc and Dataflow with Apache Beam TFX orchestrator): While you can run TFX pipelines on Dataproc or Dataflow directly, using Vertex AI Pipelines as the orchestrator provides better integration with Vertex AI services and simplifies metadata tracking and experiment management.
upvoted 5 times
...

Topic 1 Question 293

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 293 discussion

You are developing a batch process that will train a custom model and perform predictions. You need to be able to show lineage for both your model and the batch predictions. What should you do?

  • A. 1. Upload your dataset to BigQuery.
    2. Use a Vertex AI custom training job to train your model.
    3. Generate predictions by using Vertex AI SDK custom prediction routines.
  • B. 1. Use Vertex AI Experiments to evaluate model performance during training.
    2. Register your model in Vertex AI Model Registry.
    3. Generate batch predictions in Vertex AI.
  • C. 1. Create a Vertex AI managed dataset.
    2. Use a Vertex AI training pipeline to train your model.
    3. Generate batch predictions in Vertex AI.
  • D. 1. Use a Vertex AI Pipelines custom training job component to train your model.
    2. Generate predictions by using a Vertex AI Pipelines model batch predict component.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
MarcoPellegrino
10 months ago
Selected Answer: D
A: Vertex AI SDK custom prediction routines do not provide lineage B: it focuses more on experiments and does not provide lineage as Vertex AI pipelines C: might appear correct, especially by the use of managed dataset, but a generic Vertex AI training pipeline does not provide lineage for a custom model as much as a custom training job component of D.
upvoted 1 times
...
AB_C
11 months, 2 weeks ago
Selected Answer: D
A (Vertex AI custom training job and custom prediction routines): This approach lacks the built-in lineage tracking capabilities of Vertex AI Pipelines. You would need to implement custom mechanisms to log and track the relevant metadata. B (Vertex AI Experiments and Model Registry): These are valuable tools, but they focus more on experiment management and model versioning. They don't provide the same level of workflow and lineage tracking as pipelines. C (Vertex AI managed dataset and batch prediction): While helpful, this doesn't provide the same level of granularity and traceability as pipelines for tracking the complete lineage, especially the training process.
upvoted 3 times
...

Topic 1 Question 294

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 294 discussion

You work for a company that sells corporate electronic products to thousands of businesses worldwide. Your company stores historical customer data in BigQuery. You need to build a model that predicts customer lifetime value over the next three years. You want to use the simplest approach to build the model. What should you do?

  • A. Create a Vertex AI Workbench notebook. Use IPython magic to run the CREATE MODEL statement to create an ARIMA model.
  • B. Access BigQuery Studio in the Google Cloud console. Run the CREATE MODEL statement in the SQL editor to create an AutoML regression model.
  • C. Create a Vertex AI Workbench notebook. Use IPython magic to run the CREATE MODEL statement to create an AutoML regression model.
  • D. Access BigQuery Studio in the Google Cloud console. Run the CREATE MODEL statement in the SQL editor to create an ARIMA model.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
AB_C
Highly Voted 11 months, 2 weeks ago
Selected Answer: B
A and C (Vertex AI Workbench): While Vertex AI Workbench is a powerful platform for ML development, it requires setting up a notebook environment and writing Python code, which adds complexity compared to using BigQuery ML directly. D (ARIMA model): ARIMA models are specifically designed for time series forecasting. While they might be applicable in some CLTV scenarios, AutoML Regression provides a more general and potentially more accurate solution for predicting CLTV based on various customer features.
upvoted 6 times
...
MarcoPellegrino
Most Recent 10 months ago
Selected Answer: B
Exclude A and C because the question specifies "the simplest approach". BigQuery Studio is more immediate than Vertex AI Workbench Both ARIMA and AutoML works as modeling technique for customer lifetime value. The question specifies "the simplest approach", hence B AutoML is chosen over D ARIMA.
upvoted 2 times
...

Topic 1 Question 295

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 295 discussion

You work at a retail company, and are tasked with developing an ML model to predict product sales. Your company’s historical sales data is stored in BigQuery and includes features such as date, store location, product category, and promotion details. You need to choose the most effective combination of a BigQuery ML model and feature engineering to maximize prediction accuracy. What should you do?

  • A. Use a linear regression model. Perform one-hot encoding on categorical features, and create additional features based on the date, such as day of the week or month.
  • B. Use a boosted tree model. Perform label encoding on categorical features, and transform the date column into numeric values.
  • C. Use an autoencoder model. Perform label encoding on categorical features, and normalize the date column.
  • D. Use a matrix factorization model. Perform one-hot encoding on categorical features, and create interaction features between the store location and product category variables.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Rafa1312
1 month ago
Selected Answer: A
Its either A or B. I will go with A since this is essentially an Linear Regression Probably.
upvoted 1 times
...
vamgcp
1 month, 3 weeks ago
Selected Answer: A
Not b A boosted tree model is a good choice for tabular data, but it can be more complex than a linear model for this problem. The main issue with this option is label encoding. Label encoding assigns an arbitrary number to each category (e.g., "Monday" = 1, "Tuesday" = 2). This creates a false sense of order that can mislead the model, which can harm accuracy
upvoted 1 times
...
Fer660
2 months, 1 week ago
Selected Answer: B
Going for B because it might handle non-linearities better than A.
upvoted 2 times
...
Wuthuong1234
8 months, 2 weeks ago
Selected Answer: A
I would only consider between A and B. I think A is more likely since that option makes better use of the date field, which is arguably the more "efficient" approach. Linear regression tends to be more efficient than boosted trees too.
upvoted 3 times
...
Long_Pham
8 months, 3 weeks ago
Selected Answer: A
I think A, because boosted trees are effective, but in most cases, they rarely transform date columns into numeric values.
upvoted 2 times
...
strafer
9 months, 2 weeks ago
Selected Answer: B
B. Use a boosted tree model. Perform label encoding on categorical features, and transform the date column into numeric values.
upvoted 4 times
...

Topic 1 Question 296

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 296 discussion

Your organization’s employee onboarding team wants you to build an interactive self-help tool for new employees. The tool needs to receive queries from users and provide answers from the organization’s internal documentation. This documentation is spread across standalone documents such as PDF files. You want to build a solution quickly while minimizing maintenance overhead. What should you do?

  • A. Create a custom chatbot user interface hosted on App Engine. Use Vertex AI to fine-tune a Gemini model on the organization’s internal documentation. Send users’ queries to the fine-tuned model by using the custom chatbot and return the model’s responses to the users.
  • B. Deploy an internal website to a Google Kubernetes Engine (GKE) cluster. Build a search index by ingesting all of the organization’s internal documentation. Use Vertex AI Vector Search to implement a semantic search that retrieves results from the search index based on the query entered into the search box.
  • C. Use Vertex AI Agent Builder to create an agent. Securely index the organization’s internal documentation to the agent’s datastore. Send users’ queries to the agent and return the agent’s grounded responses to the users.
  • D. Deploy an internal website to a Google Kubernetes Engine (GKE) cluster. Organize the relevant internal documentation into sections. Collect user feedback on website content and store it in BigQuery. Request that the onboarding team regularly update the links based on user feedback.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
5091a99
8 months, 1 week ago
Selected Answer: C
Agent Builder for PDFs. As of March 2025, it is not in VertexAI GUI, but a separate link within GCP.
upvoted 1 times
...
fra_pavi
10 months, 1 week ago
Selected Answer: C
In my opinion the correct answer is C because I did it for a client.
upvoted 3 times
...
nish2288
10 months, 1 week ago
Selected Answer: C
Using RAG is the easiest solution.
upvoted 1 times
...

Topic 1 Question 297

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 297 discussion

You work for an ecommerce company that wants to automatically classify products in images to improve user experience. You have a substantial dataset of labeled images depicting various unique products. You need to implement a solution for identifying custom products that is scalable, effective, and can be rapidly deployed. What should you do?

  • A. Develop a rule-based system to categorize the images.
  • B. Use a TensorFlow deep learning model that is trained on the image dataset.
  • C. Use a pre-trained object detection model from Model Garden.
  • D. Use AutoML Vision to train a model using the image dataset.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
hit_cloudie
5 months, 3 weeks ago
Selected Answer: D
AutoML Vision is scalable, fast to deploy, and effective when you already have labeled data.
upvoted 2 times
...

Topic 1 Question 298

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 298 discussion

Your team is developing a customer support chatbot for a healthcare company that processes sensitive patient information. You need to ensure that all personally identifiable information (PII) captured during customer conversations is protected prior to storing or analyzing the data. What should you do?

  • A. Use the Cloud Natural Language API to identify and redact PII in chatbot conversations.
  • B. Use the Cloud Natural Language API to classify and categorize all data, including PII, in chatbot conversations.
  • C. Use the DLP API to encrypt PII in chatbot conversations before storing the data.
  • D. Use the DLP API to scan and de-identify PII in chatbot conversations before storing the data.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
nnn245bbb
6 months ago
Selected Answer: D
Defiantly DLP API but de-identification is a better balance of privacy and utility.
upvoted 1 times
...

Topic 1 Question 299

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 299 discussion

Your team is experimenting with developing smaller, distilled LLMs for a specific domain. You have performed batch inference on a dataset by using several variations of your distilled LLMs and stored the batch inference outputs in Cloud Storage. You need to create an evaluation workflow that integrates with your existing Vertex AI pipeline to assess the performance of the LLM versions while also tracking artifacts. What should you do?

  • A. Develop a custom Python component that reads the batch inference outputs from Cloud Storage, calculates evaluation metrics, and writes the results to a BigQuery table.
  • B. Use a Dataflow component that processes the batch inference outputs from Cloud Storage, calculates evaluation metrics in a distributed manner, and writes the results to a BigQuery table.
  • C. Create a custom Vertex AI Pipelines component that reads the batch inference outputs from Cloud Storage, calculates evaluation metrics, and writes the results to a BigQuery table.
  • D. Use the Automatic side-by-side (AutoSxS) pipeline component that processes the batch inference outputs from Cloud Storage, aggregates evaluation metrics, and writes the results to a BigQuery table.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Fer660
2 months, 1 week ago
Selected Answer: C
We have already computed the batch outputs, so AutoSxS is not the right one.
upvoted 1 times
...
4d742d7
5 months ago
Selected Answer: D
Use the AutoSxS pipeline component to quickly evaluate and compare your distilled LLMs—all integrated with Vertex AI Pipelines, with minimal development overhead and full artifact lineage support. Let me know if you'd like help configuring the component or reviewing its outputs!
upvoted 2 times
...
Begum
5 months, 4 weeks ago
Selected Answer: C
Vertrx AI component can be used. No need to complicate solution.
upvoted 1 times
...
tmpuserx
6 months ago
Selected Answer: D
AutoSxS said is meant for models comparism
upvoted 2 times
...
5091a99
8 months, 1 week ago
Selected Answer: C
Answer C: Vertex AI Pipeline. - The Flow already includes Pipelines, which allow for more flexibility in model training, evaluation and metadata storage. No need to go outside of the environment.
upvoted 1 times
...

Topic 1 Question 300

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 300 discussion

You work for a bank. You need to train a model by using unstructured data stored in Cloud Storage that predicts whether credit card transactions are fraudulent. The data needs to be converted to a structured format to facilitate analysis in BigQuery. Company policy requires that data containing personally identifiable information (PII) remain in Cloud Storage. You need to implement a scalable solution that preserves the data’s value for analysis. What should you do?

  • A. Use BigQuery’s authorized views and column-level access controls to restrict access to PII within the dataset.
  • B. Use the DLP API to de-identify the sensitive data before loading it into BigQuery.
  • C. Store the unstructured data in a separate PII-compliant BigQuery database.
  • D. Remove the sensitive data from the files manually before loading them into BigQuery.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
CassiniExam
8 months, 2 weeks ago
Selected Answer: B
B. This is the most effective and scalable solution. The DLP (Data Loss Prevention) API is designed to identify and transform sensitive data.
upvoted 2 times
...

Topic 1 Question 301

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 301 discussion

You are an ML engineer at a bank. You need to build a solution that provides transparent and understandable explanations for AI-driven decisions for loan approvals, credit limits, and interest rates. You want to build this system to require minimal operational overhead. What should you do?

  • A. Deploy the Learning Interpretability Tool (LIT) on App Engine to provide explainability and visualization of the output.
  • B. Use Vertex Explainable AI to generate feature attributions, and use feature-based explanations for your models.
  • C. Use AutoML Tables with built-in explainability features, and use Shapley values for explainability.
  • D. Deploy pre-trained models from TensorFlow Hub to provide explainability using visualization tools.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Bardapapa
1 month, 3 weeks ago
Selected Answer: B
As a managed service, it significantly reduces the operational overhead of deploying, maintaining, and scaling a separate explanation tool. Feature attributions (like "Applicant's credit score contributed 30% to the approval decision") are highly valuable for transparent, regulatory-compliant explanations in banking.
upvoted 3 times
...
d83229d
2 months, 2 weeks ago
Selected Answer: B
This is the prime use-case for Vertex Explainable AI. The only difference here would be, like the usual case between AutoML vs Vertex: Customizability vs just using models available on AutoML. I would go with B since it's just as low-overhead but without the customizability
upvoted 2 times
d83229d
2 months, 2 weeks ago
With the customizability*
upvoted 1 times
Fer660
2 months, 1 week ago
B tells us nothing about how the models would be built.
upvoted 2 times
...
...
...
Tara3
4 months, 1 week ago
Selected Answer: C
My answer is C AutoML Tables automatically builds and trains machine learning models based on your data, requiring minimal manual effort in model selection and hyperparameter tuning. 


It offers built-in explainability features, including Shapley values, which provide a mathematically sound way to understand the contribution of each feature to the model's prediction. 


AutoML Tables significantly reduces operational overhead by automating model building and providing explainability without requiring custom code or complex configurations.
upvoted 2 times
...
ricardovazz
4 months, 2 weeks ago
Selected Answer: B
Between B and C, the question essentially comes down to: Do you need the flexibility to customize your ML approach beyond what AutoML provides? For most banking applications, the answer is yes.
upvoted 3 times
...
CassiniExam
8 months, 2 weeks ago
Selected Answer: C
Considering the requirements of transparency, understandability, and minimal operational overhead, C. Use AutoML Tables with built-in explainability features, and use Shapley values for explainability. is the best option. It leverages a managed service with built-in explainability, providing a scalable and low-maintenance solution.
upvoted 3 times
...

Topic 1 Question 302

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 302 discussion

You are building an application that extracts information from invoices and receipts. You want to implement this application with minimal custom code and training. What should you do?

  • A. Use the Cloud Vision API with TEXT_DETECTION type to extract text from the invoices and receipts, and use a pre-built natural language processing (NLP) model to parse the extracted text.
  • B. Use the Cloud Document AI API to extract information from the invoices and receipts.
  • C. Use Vertex AI Agent Builder with the pre-built Layout Parser model to extract information from the invoices and receipts.
  • D. Train an AutoML Natural Language model to classify and extract information from the invoices and receipts.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Tara3
4 months, 1 week ago
Selected Answer: B
My Answer is B. Use the Cloud Document AI API to extract information from the invoices and receipts.  Cloud Document AI service is specifically designed for structured document processing, including invoice and receipt extraction. It uses pre-trained models to identify key fields like dates, amounts, items, and vendor information, minimizing the need for custom training or development.
upvoted 2 times
...
hit_cloudie
5 months, 3 weeks ago
Selected Answer: B
B is correct This API is purpose-built for structured document processing (e.g., invoices, receipts). It extracts structured fields like total, date, vendor, etc. Requires minimal custom code and no training Ideal for this use case
upvoted 1 times
...
5091a99
8 months, 1 week ago
Selected Answer: B
Answer: B. DocumentAI API is much better at extracting specific fields and returning more repeatable results for specific field extraction than Cloud Vision (which is more directed at Object detection, etc.)
upvoted 2 times
...

Topic 1 Question 303

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 303 discussion

You work for a media company that operates a streaming movie platform where users can search for movies in a database. The existing search algorithm uses keyword matching to return results. Recently, you have observed an increase in searches using complex semantic queries that include the movies’ metadata such as the actor, genre, and director.

You need to build a revamped search solution that will provide better results, and you need to build this proof of concept as quickly as possible. How should you build the search platform?

  • A. Use a foundational large language model (LLM) from Model Garden as the search platform’s backend.
  • B. Configure Vertex AI Vector Search as the search platform’s backend.
  • C. Use a BERT-based model and host it on a Vertex AI endpoint.
  • D. Create the search platform through Vertex AI Agent Builder.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
5091a99
Highly Voted 8 months, 1 week ago
Selected Answer: B
Answer B. Vector search is more efficient for 'Search' based queries. - A: Makes sense and easily deployable, but this is 'Search' and LLMs typically are for more conversational applications that may not prioritize speed. - C: BERT unnecessary complexity and training. - D: Would work, but Agents are more geared toward conversation and results have higher latency compared to vector search.
upvoted 5 times
dija123
3 weeks, 4 days ago
Vertex AI Vector Search might takes hours for first indexing
upvoted 1 times
...
...
dija123
Most Recent 1 month ago
Selected Answer: D
Vertex AI Agent Builder is a very good service for media search Specially for a quick POC
upvoted 1 times
...

Topic 1 Question 304

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 304 discussion

You are an AI engineer that works for a popular video streaming platform. You built a classification model using PyTorch to predict customer churn. Each week, the customer retention team plans to contact customers that have been identified as at risk of churning with personalized offers. You want to deploy the model while minimizing maintenance effort. What should you do?

  • A. Use Vertex AI’s prebuilt containers for prediction. Deploy the container on Cloud Run to generate online predictions.
  • B. Use Vertex AI’s prebuilt containers for prediction. Deploy the model on Google Kubernetes Engine (GKE), and configure the model for batch prediction.
  • C. Deploy the model to a Vertex AI endpoint, and configure the model for batch prediction. Schedule the batch prediction to run weekly.
  • D. Deploy the model to a Vertex AI endpoint, and configure the model for online prediction. Schedule a job to query this endpoint weekly.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
qaz09
4 months ago
Selected Answer: C
C & D are minimising maintenance effort. The model will be used weekly (batch predictions) --> C
upvoted 1 times
...
yokoyan
8 months, 1 week ago
Selected Answer: C
(Gemini Explanation) Vertex AI Batch Prediction: This service is specifically designed for batch inference, making it ideal for processing large datasets and generating predictions offline. Scheduled Jobs: Vertex AI allows you to schedule batch prediction jobs, automating the weekly process and eliminating the need for manual intervention. Minimized Maintenance: Vertex AI handles the underlying infrastructure, reducing the maintenance burden compared to managing a Kubernetes cluster or manually querying an online endpoint. Cost Efficiency: Batch prediction is generally more cost-effective for large-scale offline processing than repeatedly querying an online endpoint.
upvoted 2 times
...

Topic 1 Question 305

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 305 discussion

Your company recently migrated several of is ML models to Google Cloud. You have started developing models in Vertex AI. You need to implement a system that tracks model artifacts and model lineage. You want to create a simple, effective solution that can also be reused for future models. What should you do?

  • A. Use a combination of Vertex AI Pipelines and the Vertex AI SDK to integrate metadata tracking into the ML workflow.
  • B. Use Vertex AI Pipelines for model artifacts and MLflow for model lineage.
  • C. Use Vertex AI Experiments for model artifacts and use Vertex ML Metadata for model lineage.
  • D. Implement a scheduled metadata tracking solution using Cloud Composer and Cloud Run functions.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
dija123
3 weeks, 4 days ago
Selected Answer: A
Use the native, purpose-built orchestration service (Pipelines) via its SDK.
upvoted 1 times
...
billyst41
1 month, 3 weeks ago
Selected Answer: A
I'm going with A. C is redundant. Vertex AI Experiments is a higher-level concept that organizes groups of pipeline or training runs for comparative analysis. It relies on the underlying Vertex ML Metadata service, which tracks artifacts and lineage automatically when using pipelines. You wouldn't use them as separate systems for different purposes.
upvoted 2 times
...
bigdapper
2 months, 1 week ago
Selected Answer: A
A is correct. C is wrong because artifacts are tracked in ML Metadata (and surfaced via Experiments), not tracked exclusively in Experiments. Metadata tracks both artifacts and lineage.
upvoted 2 times
...
qaz09
4 months ago
Selected Answer: C
C --> Vertex AI Experiments tracks model executions and artifacts. Vertex ML Metadata tracks model lineage. A - is not reusable solution. B - MLflow is not Google's solution, so it's less simple to implement. D - this is custom solutions, so less simple
upvoted 3 times
...
hit_cloudie
5 months, 3 weeks ago
Selected Answer: C
Vertex AI Experiments captures model artifacts and training runs. Vertex ML Metadata tracks lineage, parameters, and outputs. This is native, simple, and reusable within Vertex AI workflows.
upvoted 3 times
...
a38a239
6 months, 1 week ago
Selected Answer: A
Vertex AI Pipelines automatically logs every component’s inputs, outputs, parameters, and artifacts into the built‑in ML Metadata store, giving you end‑to‑end lineage for data, models, and evaluation results. The Vertex AI SDK lets you programmatically log any extra metadata—git commit hashes, container image URIs, custom evaluation reports—directly into the same metadata store from within your training or preprocessing code.
upvoted 2 times
...

Topic 1 Question 306

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 306 discussion

You work for a large retailer, and you need to build a model to predict customer chum. The company has a dataset of historical customer data, including customer demographics purchase history, and website activity. You need to create the model in BigQuery ML and thoroughly evaluate its performance. What should you do?

  • A. Create a linear regression model in BigQuery ML, and register the model in Vertex AI Model Registry. Use Vertex AI to evaluate the model performance.
  • B. Create a logistic regression model in BigQuery ML, and register the model in Vertex AI Model Registry. Use ML.ARIMA_EVALUATE function to evaluate the model performance.
  • C. Create a linear regression model in BigQuery ML. Use the ML.EVALUATE function to evaluate the model performance.
  • D. Create a logistic regression model in BigQuery ML. Use the ML.CONFUSION_MATRIX function to evaluate the model performance.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
hit_cloudie
5 months, 3 weeks ago
Selected Answer: D
Logistic regression is the right model. ML.CONFUSION_MATRIX is a standard classification evaluation tool for churn prediction.
upvoted 1 times
...

Topic 1 Question 307

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 307 discussion

You are an AI architect at a popular photo sharing social media platform. Your organization's content moderation team currently scans images uploaded by users and removes explicit images manually. You want to implement an AI service to automatically prevent users from uploading explicit images. What should you do?

  • A. Train an image clustering model by using TensorFlow in a Vertex AI Workbench instance. Deploy this model to a Vertex AI endpoint and configure it for online inference. Run this model each time a new image is uploaded to identify and block inappropriate uploads.
  • B. Develop a custom TensorFlow model in a Vertex AI Workbench instance. Train the model on a dataset of manually labeled images. Deploy the model to a Vertex AI endpoint. Run periodic batch inference to identify inappropriate uploads and report them to the content moderation team.
  • C. Create a dataset using manually labeled images. Ingest this dataset into AutoML. Train an image classification model and deploy into a Vertex AI endpoint. Integrate this endpoint with the image upload process to identify and block inappropriate uploads. Monitor predictions and periodically retrain the model.
  • D. Send a copy of every user-uploaded image to a Cloud Storage bucket. Configure a Cloud Run function that triggers the Cloud Vision API to detect explicit content each time a new image is uploaded. Report the classifications to the content moderation team for review.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
dija123
1 month ago
Selected Answer: C
Option C is the only choice that combines a suitable model-building strategy (AutoML for image classification) with the correct real-time, preventative process required to block uploads as they happen.
upvoted 1 times
...
el_vampiro
2 months ago
Selected Answer: D
Training explicit image detection model feels overkill when the API can detect
upvoted 2 times
...
qaz09
4 months ago
Selected Answer: C
C is correct answer A --> clustering is unsupervised model, but we have labeled dataset for training, so it's better to use some supervised ML models for better accuracy B --> this one describes batch predictions and for our use case we need online predictions D --> it works after upload and the goal here is to prevent users from uploading
upvoted 2 times
...

Topic 1 Question 308

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 308 discussion

You are an ML engineer at a bank. The bank's leadership team wants to reduce the number of loan defaults. The bank has labeled historic data about loan defaults stored in BigQuery. You have been asked to use AI to support the loan application process. For compliance reasons, you need to provide explanations for loan rejections. What should you do?

  • A. Import the historic loan default data into AutoML. Train and deploy a linear regression model to predict default probability. Report the probability of default for each loan application.
  • B. Create a custom application that uses the Gemini large language model (LLM). Provide the historic data as context to the model, and prompt the model to predict customer defaults. Report the prediction and explanation provided by the LLM for each loan application.
  • C. Train and deploy a BigQuery ML classification model trained on historic loan default data. Enable feature-based explanations for each prediction. Report the prediction, probability of default, and feature attributions for each loan application.
  • D. Load the historic loan default data into a Vertex AI Workbench instance. Train a deep learning classification model using TensorFlow to predict loan default. Run inference for each loan application, and report the predictions.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
qaz09
4 months ago
Selected Answer: C
C is the only one which describes step where we provide explanations for loan rejections.
upvoted 2 times
...
hit_cloudie
5 months, 3 weeks ago
Selected Answer: C
BigQuery ML supports: Model training on structured data Prediction + SHAP-style feature attributions Easy deployment Full traceability for compliance
upvoted 3 times
...

Topic 1 Question 309

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 309 discussion

You are developing a natural language processing model that analyzes customer feedback to identify positive, negative, and neutral experiences. During the testing phase, you notice that the model demonstrates a significant bias against certain demographic groups, leading to skewed analysis results. You want to address this issue following Google's responsible AI practices. What should you do?

  • A. Use Vertex AI's model evaluation lo assess bias in the model's predictions, and use post-processing to adjust outputs for identified demographic discrepancies.
  • B. Implement a more complex model architecture that can capture nuanced patterns in language to reduce bias.
  • C. Audit the training dataset to identify underrepresented groups and augment the dataset with additional samples before retraining the model.
  • D. Use Vertex Explainable AI to generate explanations and systematically adjust the predictions to address identified biases.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
dija123
1 month ago
Selected Answer: C
The core principle of responsible AI is to address problems at their source. In machine learning, model bias almost always originates from the training data.
upvoted 3 times
...
Bardapapa
1 month, 3 weeks ago
Selected Answer: A
This option represents the standard detect, measure, and mitigate cycle emphasized in MLOps and Responsible AI frameworks, leveraging native cloud tools for the immediate problem (a biased, deployed model).
upvoted 2 times
...
qaz09
4 months ago
Selected Answer: A
A -> here we are using Google's recommended tool for bias evaluation (https://cloud.google.com/vertex-ai/docs/evaluation/model-bias-metrics) B -> using more complex model does not address bias directly c -> manual work + we can not expect that there is an option to add more samples to training dataset d -> adjusting predictions is manual work
upvoted 1 times
spradhan
3 months, 2 weeks ago
Yes but we cannot adjust the output in post processing. Preprocessing mitigation would make sense. As per google https://developers.google.com/machine-learning/crash-course/fairness/mitigating-bias Bias can be mitigated by augmenting the data or getting more data or changing loss function
upvoted 1 times
...
...
hit_cloudie
5 months, 3 weeks ago
Selected Answer: C
Google's Responsible AI best practices prioritize dataset auditing and fairness through balanced representation. This helps address the root cause of the bias (biased training data).
upvoted 4 times
...

Topic 1 Question 310

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 310 discussion

You recently deployed an image classification model on Google Cloud. You used Cloud Build to build a CI/CD pipeline for the model. You need to ensure that the model stays up-to-date with data and code changes by using an efficient retraining process. What should you do?

  • A. Use Cloud Run functions to monitor data drift in real time and trigger a Vertex AI Training job to retrain the model when data drift exceeds a predetermined threshold.
  • B. Configure a Git repository trigger in Cloud Build to initiate retraining when there are new code commits to the model's repository and a Pub/Sub trigger when there is new data in Cloud Storage.
  • C. Use Cloud Scheduler to initiate a daily retraining job in Vertex AI Pipelines.
  • D. Configure Cloud Composer to orchestrate a weekly retraining job that includes data extraction from BigQuery, model retraining with Vertex AI Training, and model deployment to a Vertex AI endpoint.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
qaz09
4 months ago
Selected Answer: B
B -> both code changes and data changes are addressed in this option C, D --> Efficient retraining process suggests that retraining should be triggered when there is a need for one (when data changes or new code is added) -> this excludes option C (daily retraining) and D (weekly retraining) A -> in Option A there does not trigger retraining when the code changes
upvoted 1 times
...
hit_cloudie
5 months, 3 weeks ago
Selected Answer: B
This is a CI/CD best practice: Cloud Build handles retraining on code changes (via Git trigger). Pub/Sub can trigger retraining on new data (e.g., new files in Cloud Storage). Efficient and automated.
upvoted 1 times
...

Topic 1 Question 311

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 311 discussion

You lead a data science team that is working on a computationally intensive project involving running several experiments. Your team is geographically distributed and requires a platform that provides the most effective real-time collaboration and rapid experimentation. You plan to add GPUs to speed up your experimentation cycle, and you want to avoid having to manually set up the infrastructure. You want to use the Google-recommended approach. What should you do?

  • A. Configure a managed Dataproc cluster for large-scale data processing. Configure individual Jupyter notebooks on VMs that each team member uses for experimentation and model development.
  • B. Use Colab Enterprise with Cloud Storage for data management. Use a Git repository for version control.
  • C. Use Vertex AI Workbench and Cloud Storage for data management. Use a Git repository for version control.
  • D. Configure a distributed JupyterLab instance that each team member can access on a Compute Engine VM. Use a shared code repository for version control.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Bardapapa
1 month, 3 weeks ago
Selected Answer: B
best solution
upvoted 1 times
...
bigdapper
2 months, 1 week ago
Selected Answer: B
Workbench requires infrastructure setup. Colab does not.
upvoted 2 times
...
Fer660
2 months, 1 week ago
Selected Answer: B
This is the textbook use case for colab enterprise.
upvoted 2 times
...
spradhan
3 months, 2 weeks ago
Selected Answer: B
Go for colab if you want real time collaboration. Go for Workbench if you need control. https://cloud.google.com/vertex-ai/docs/workbench/notebook-solution
upvoted 3 times
...
qaz09
4 months ago
Selected Answer: B
I think Colab is the better option here, based on https://www.tensorops.ai/post/vertex-ai-workbench-vs-colab-enterprise-which-notebook-solution-is-right-for-you. For this use case we need something for collaboration and fully managed -> hence Colab. For Workbook notebooks we need to manually set up some infrastructure.
upvoted 2 times
...
hit_cloudie
5 months, 3 weeks ago
Selected Answer: C
C Vertex AI Workbench provides: Pre-configured JupyterLab Native GPU support Seamless integration with Git and GCS No manual infra setup Google-recommended for collaborative ML workflows
upvoted 4 times
...
f4ccd7a
6 months, 2 weeks ago
Selected Answer: C
Co-lab is better for real-time collaboration
upvoted 2 times
...

Topic 1 Question 312

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 312 discussion

You need to train a ControlNet model with Stable Diffusion XL for an image editing use case. You want to train this model as quickly as possible. Which hardware configuration should you choose to train your model?

  • A. Configure one a2-highgpu-1g instance with an NVIDIA A100 GPU with 80 GB of RAM. Use float32 precision during model training.
  • B. Configure one a2-highgpu-1g instance with an NVIDIA A100 GPU with 80 GB of RAM. Use bfloat16 quantization during model training.
  • C. Configure four n1-standard-16 instances, each with one NVIDIA Tesla T4 GPU with 16 GB of RAM. Use float32 precision during model training.
  • D. Configure four n1-standard-16 instances, each with one NVIDIA Tesla T4 GPU with 16 GB of RAM. Use floar16 quantization during model training.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
qaz09
4 months ago
Selected Answer: B
C&D rejected due to small amount of RAM. bfloat16 is supported for machines for A100GPU 80GB RAM and will help decreasing time to converge without losing accuracy.
upvoted 3 times
...
hit_cloudie
5 months, 3 weeks ago
Selected Answer: B
NVIDIA A100 supports bfloat16, which trains faster and uses less memory than float32, with minimal accuracy loss. Best for high-throughput training.
upvoted 1 times
...

Topic 1 Question 313

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 313 discussion

You are the lead ML engineer on a mission-critical project that involves analyzing massive datasets using Apache Spark. You need to establish a robust environment that allows your team to rapidly prototype Spark models using Jupyter notebooks. What is the fastest way to achieve this?

  • A. Set up a Vertex AI Workbench instance with a Spark kernel.
  • B. Use Colab Enterprise with a Spark kernel.
  • C. Set up a Dataproc cluster with Spark and use Jupyter notebooks.
  • D. Configure a Compute Engine instance with Spark and use Jupyter notebooks.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
el_vampiro
2 months ago
Selected Answer: A
Workbench with Dataproc Serverless, because they want the fastest way. Dataproc cluster creation takes longer than a workbench instance. Also, there is no mention of an existing cluster - setting one up just to run a query doesnt make sense.
upvoted 1 times
...
4d742d7
5 months ago
Selected Answer: C
Since we need a robust environment Dataproc cluster is better
upvoted 1 times
...
hit_cloudie
5 months, 3 weeks ago
Selected Answer: C
Dataproc is Google's managed Spark service, and it supports Jupyter notebooks natively. This allows rapid setup, scalability, and is optimized for massive datasets — exactly what’s needed here.
upvoted 2 times
...

Topic 1 Question 314

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 314 discussion

You are training a large-scale deep learning model on a Cloud TPU. While monitoring the training progress through Tensorboard, you observe that the TPU utilization is consistently low and there are delays between the completion of one training step and the start of the next step. You want to improve TPU utilization and overall training performance. How should you address this issue?

  • A. Apply tf.data.Detaset.map with vectorized operations and parallelization.
  • B. Use tf.data.Detaset.interleave with multiple data sources.
  • C. Use tf.data.Detaset.cache on the dataset after the first epoch.
  • D. Implement tf.data.Detaset.prefetch in the data pipeline.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
hit_cloudie
5 months, 3 weeks ago
Selected Answer: D
tf.data.Dataset.prefetch in the pipeline ✅ Correct — prefetch() overlaps data preprocessing with model execution, ensuring the next batch is prepared while the current batch is being processed, thus maximizing TPU utilization and minimizing idle time.
upvoted 3 times
...

Topic 1 Question 315

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 315 discussion

You are building an ML pipeline to process and analyze both steaming and batch datasets. You need the pipeline to handle data validation, preprocessing, model training, and model deployment in a consistent and automated way. You want to design an efficient and scalable solution that captures model training metadata and is easily reproducible. You want to be able to reuse custom components for different parts of your pipeline. What should you do?

  • A. Use Cloud Composer for distributed processing of batch and streaming data in the pipeline.
  • B. Use Dataflow for distributed processing of batch and streaming data in the pipeline.
  • C. Use Cloud Build to build and push Docker images for each pipeline component.
  • D. Implement an orchestration framework such as Kubeflow Pipelines or Vertex AI Pipelines.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
hit_cloudie
Highly Voted 5 months, 3 weeks ago
Selected Answer: D
These are purpose-built ML orchestration frameworks. They support: Metadata tracking Reusability of components End-to-end automation Reproducibility Integration with Vertex AI services
upvoted 6 times
...

Topic 1 Question 316

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 316 discussion

You are developing an ML model on Vertex AI that needs to meet specific interpretability requirements for regulatory compliance. You want to use a combination of model architectures and modeling techniques to maximize accuracy and interpretability. How should you create the model?

  • A. Use a convolutional neural network (CNN)-based deep learning model architecture, and use local interpretable model-agnostic explanations (LIME) for interpretability.
  • B. Use a recurrent neural network (RNN)-based deep learning model architecture, and use integrated gradients for interpretability.
  • C. Use a boosted decision tree-based model architecture, and use SHAP values for interpretability.
  • D. Use a long short-term memory (LSTM)-based model architecture, and use local interpretable model-agnostic explanations (LIME) for interpretability.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
qaz09
4 months ago
Selected Answer: C
Boosted trees model is the only highly interpratable model here. The rest is more "black box" solutions
upvoted 3 times
...
hit_cloudie
5 months, 3 weeks ago
Selected Answer: C
Boosted decision trees (e.g., XGBoost, LightGBM) offer high accuracy on structured data and work well in production. SHAP (SHapley Additive exPlanations) provides strong theoretical interpretability and is accepted in regulatory environments.
upvoted 2 times
...

Topic 1 Question 317

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 317 discussion

You have developed a fraud detection model for a large financial institution using Vertex AI. The model achieves high accuracy, but the stakeholders are concerned about the model's potential for bias based on customer demographics. You have been asked to provide insights into the model's decision-making process and identify any fairness issues. What should you do?

  • A. Create feature groups using Vertex AI Feature Store to segregate customer demographic features and non-demographic features. Retrain the model using only non-demographic features.
  • B. Use feature attribution in Vertex AI to analyze model predictions and the impact of each feature on the model's predictions.
  • C. Enable Vertex AI Model Monitoring to detect training-serving skew. Configure an alert to send an email when the skew or drift for a modes feature exceeds a predefined threshold. Re-train the model by appending new data to existing raining data.
  • D. Compile a dataset of unfair predictions. Use Vertex AI Vector Search to identify similar data points in the model's predictions. Report these data points to the stakeholders.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
hit_cloudie
5 months, 3 weeks ago
Selected Answer: B
Feature attribution (e.g., SHAP values) helps you understand how much influence each feature has (including demographics). This is ideal for identifying bias and fairness issues in model predictions.
upvoted 3 times
...

Topic 1 Question 318

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 318 discussion

You developed an ML model using Vertex AI and deployed it to a Vertex AI endpoint. You anticipate that the model will need to be retrained as new data becomes available. You have configured a Vertex AI Model Monitoring Job. You need to monitor the model for feature attribution drift and establish continuous evaluation metrics. What should you do?

  • A. Set up alerts using Cloud Logging, and use the Vertex AI console to review feature attributions.
  • B. Set up alerts using Cloud Logging, and use Looker Studio to create a dashboard that visualizes feature attribution drift. Review the dashboard periodically.
  • C. Enable request-response logging for the Vertex AI endpoint, and set up alerts using Pub/Sub. Create a Cloud Run function to run TensorFlow Data Validation on your dataset.
  • D. Enable request-response logging for the Vertex AI endpoint, and set up alerts using Cloud Logging. Review the feature attributions in the Google Cloud console when an alert is received.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
dija123
3 weeks, 4 days ago
Selected Answer: A
Totally agree with A.
upvoted 1 times
...
el_vampiro
2 months ago
Selected Answer: A
qaz01 is right
upvoted 1 times
...
qaz09
4 months ago
Selected Answer: A
not C, D -> If an endpoint has Model Monitoring enabled, you can't enable request-response logging for the same endpoint. https://cloud.google.com/vertex-ai/docs/predictions/online-prediction-logging#model-monitoring A is simpler solution than B, I don't think there is a need to create custom Looker Studio dashboards, since you can use Vertex AI console.
upvoted 4 times
...
hit_cloudie
5 months, 3 weeks ago
Selected Answer: D
This leverages Vertex AI Model Monitoring with request-response logging, Cloud Logging alerts, and built-in feature attribution drift review.
upvoted 2 times
...

Topic 1 Question 319

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 319 discussion

You work as an ML researcher at an investment bank, and you are experimenting with the Gemma large language model (LLM). You plan to deploy the model for an internal use case. You need to have full control of the mode's underlying infrastructure and minimize the model's inference time. Which serving configuration should you use for this task?

  • A. Deploy the model on a Vertex AI endpoint manually by creating a custom inference container.
  • B. Deploy the model on a Google Kubernetes Engine (GKE) cluster by using the deployment options in Model Garden.
  • C. Deploy the model on a Vertex AI endpoint by using one-click deployment in Model Garden.
  • D. Deploy the model on a Google Kubernetes Engine (GKE) cluster manually by cresting a custom yaml manifest.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
vamgcp
1 month, 3 weeks ago
Selected Answer: D
Deploying the model on GKE with a custom YAML manifest allows maximum control over infrastructure and latency, aligning with the need for low inference time and internal model use. Vertex AI's one-click deployment (Option A) limits control, and deploying on Vertex AI (Option C) doesn't allow for as much customization as a GKE setup.
upvoted 3 times
...
Fer660
2 months, 1 week ago
Selected Answer: B
Not A or C: Vertex AI deployment will not give you full control of the underlying infra. Not D: because B is faster path to deployment B:in order to minimize to deployment, start with a standard deployment. Then tweak the deployment to meet your latency requirements.
upvoted 1 times
...
hit_cloudie
5 months, 3 weeks ago
Selected Answer: A
Provides full control over the container and environment while benefiting from Vertex AI’s optimized serving infrastructure. You can control dependencies, runtime, and hardware (e.g., GPUs), which is ideal for internal LLM serving with low latency.
upvoted 3 times
...
tmpuserx
6 months ago
Selected Answer: D
Model Garden deployment options typically use default configurations not optimized for low latency
upvoted 2 times
...

Topic 1 Question 320

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 320 discussion

You are an ML researcher and are evaluating multiple deep learning-based model architectures and hyperparameter configurations. You need to implement a robust solution to track the progress of each model iteration, visualize key metrics, gain insights into model internals, and optimize training performance.

You want your solution to have the most efficient and powerful approach to compare the models and have the strongest visualization abilities. How should you bull this solution?

  • A. Use Vertex AI TensorBoard for in-depth visualization and analysis, and use BigQuery for experiment tracking and analysis.
  • B. Use Vertex AI TensorBoard for visualizing training progress and model behavior, and use Vertex AI Feature Store to stove and manage experiment data for analysis and reproducibility.
  • C. Use Vertex AI Experiments for tracking iterations and comparison, and use Vertex AI TensorBoard for visualization and analysis of the training metrics and model architecture.
  • D. Use Vertex AI Experiments for tracking iterations and comparison, and use BigQuery and Looker Studio for visualization and analysis of the training metrics and model architecture.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
vamgcp
1 month, 3 weeks ago
Selected Answer: C
C is the answer
upvoted 1 times
...
kirukkuman
4 months, 1 week ago
Selected Answer: C
This is the ideal combination. Vertex AI Experiments is specifically designed to log, track, and compare different model runs, including their parameters and performance metrics. Vertex AI TensorBoard is the premier tool for deep, interactive visualization of ML training, allowing you to inspect model graphs, view metrics over time, analyze embedding projections, and more. These two services are designed to work together seamlessly, providing the most efficient and powerful solution.
upvoted 3 times
...
alja12
4 months, 1 week ago
Selected Answer: C
Vertex AI Experiments (designed specifically for tracking model iterations, hyperparameters, and experiment metadata — perfect for comparing experiments) + Vertex AI TensorBoard (provides rich visualization of training progress, model internals, and behavior)
upvoted 2 times
...

Topic 1 Question 321

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 321 discussion

You are developing a model to detect fraudulent credit card transactions. You need to prioritize detection, because missing even one fraudulent transaction could severely impact the credit card holder. You used AutoML to train a model on users' profile information and credit card transaction data. After training the initial model, you notice that the model is failing to detect many fraudulent transactions. How should you increase the number of fraudulent transactions that are detected?

  • A. Add more non-fraudulent examples to the training set.
  • B. Reduce the maximum number of node hours for training.
  • C. Increase the probability threshold to classify a fraudulent transaction.
  • D. Decrease the probability threshold to classify a fraudulent transaction.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
qaz09
4 months ago
Selected Answer: D
Lower probability threshold -> more transactions marked as fraud
upvoted 3 times
...
hit_cloudie
5 months, 3 weeks ago
Selected Answer: D
This will increase the number of transactions flagged as fraudulent, improving recall (even at the cost of more false positives, which is acceptable in fraud detection).
upvoted 4 times
...

Topic 1 Question 322

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 322 discussion

You work at an organization that maintains a cloud-based communication platform that integrates conventional chat, voice, and video conferencing into one platform. The audio recordings are stored in Cloud Storage. All recordings have a 16 kHz sample rate and are more than one minute long. You need to implement a new feature in the platform that will automatically transcribe voice call recordings into text for future applications, such as call summarization and sentiment analysis. How should you implement the voice call transcription feature while following Google-recommended practices?

  • A. Use the original audio sampling rate, and transcribe the audio by using the Speech-to-Text API with synchronous recognition.
  • B. Use the original audio sampling rate, and transcribe the audio by using the Speech-to-Text API with asynchronous recognition.
  • C. Downsample the audio recordings to 8 kHz, and transcribe the audio by using the Speech-to-Text API with synchronous recognition.
  • D. Downsample the audio recordings to 8 kHz, and transcribe the audio by using the Speech-to-Text API with asynchronous recognition.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
OpenKnowledge
3 weeks, 5 days ago
Selected Answer: B
In machine learning for audio, the sample rate is the number of audio samples taken per second, measured in Hertz (Hz) or kilohertz (kHz). It is crucial because it determines the amount of detail captured in the audio and affects data size. ML models require a consistent sample rate across all audio inputs to process the data correctly. A common sampling rate used in training speech models is 16,000 Hz or 16 kHz.
upvoted 1 times
...
Fer660
2 months, 1 week ago
Selected Answer: B
We do not need synchronous recognition, as the transcripts are needed for future use.
upvoted 2 times
...
c797628
2 months, 2 weeks ago
Selected Answer: B
The answer is B 100%
upvoted 2 times
...

Topic 1 Question 323

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 323 discussion

You have created multiple versions of an ML model and have imported them to Vertex AI Model Registry. You want to perform A/B testing to identify the best performing model using the simplest approach. What should you do?

  • A. Split incoming traffic to distribute prediction requests among the versions. Monitor the performance of each version using Vertex AI's built-in monitoring tools.
  • B. Split incoming traffic among Google Kubernetes Engine (GKE) clusters, and use Traffic Director to distribute prediction requests to different versions. Monitor the performance of each version using Cloud Monitoring.
  • C. Split incoming traffic to distribute prediction requests among the versions. Monitor the performance of each version using Looker Studio dashboards that compare logged data for each version.
  • D. Split incoming traffic among separate Cloud Run instances of deployed models. Monitor the performance of each version using Cloud Monitoring.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
jgsec
2 months, 3 weeks ago
Selected Answer: A
A is the simplest solution
upvoted 3 times
...
qaz09
4 months ago
Selected Answer: A
I think A. B -> not the simplest solution since you have to manage GKE clusters C -> you need to create Looker Studio dashboards, so not the simplest option D -> You need to configure Cloud Run, hence not the simplest approach
upvoted 1 times
...

Topic 1 Question 324

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 324 discussion

You need to train an XGBoost model on a small dataset. Your training code requires custom dependencies. You need to set up a Vertex AI custom training job. You want to minimize the startup time of the training job while following Google-recommended practices. What should you do?

  • A. Create a custom container that includes the data and the custom dependencies. In your training application, load the data into a pandas DataFrame and train the model.
  • B. Store the data in a Cloud Storage bucket, and use the XGBoost prebuilt custom container to run your training application. Create a Python source distribution that installs the custom dependencies at runtime. In your training application, read the data from Cloud Storage and train the model.
  • C. Use the XGBoost prebuilt custom container. Create a Python source distribution that includes the data and installs the custom dependencies at runtime. In your training application, load the data into a pandas DataFrame and train the model.
  • D. Store the data in a Cloud Storage bucket, and create a custom container with your training application and its custom dependencies. In your training application, read the data from Cloud Storage and train the model.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
qaz09
4 months ago
Selected Answer: D
question 278 -> D
upvoted 2 times
...
kirukkuman
4 months, 1 week ago
Selected Answer: D
The key to minimizing startup time is to pre-install all necessary libraries and custom dependencies into a custom container image. When the Vertex AI training job starts, it can immediately run your code without wasting time downloading and installing packages. This directly addresses the main requirement.
upvoted 3 times
...
AlizCert
5 months ago
Selected Answer: D
On the basis of exclusion. B and C can't be (installing deps at runtime). A (container contains data - antipattern). -> D
upvoted 4 times
...

Topic 1 Question 325

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 325 discussion

You are building an ML model to predict customer churn for a subscription service. You have trained your model on Vertex AI using historical data, and deployed it to a Vertex AI endpoint for real-time predictions. After a few weeks, you notice that the model's performance, measured by AUC (area under the ROC curve), has dropped significantly in production compared to its performance during training. How should you troubleshoot this problem?

  • A. Monitor the training/serving skew of feature values for requests sent to the endpoint.
  • B. Monitor the resource utilization of the endpoint, such as CPU and memory usage, to identify potential bottlenecks in performance.
  • C. Enable Vertex Explainable AI feature attribution to analyze model predictions and understand the impact of each feature on the model's predictions.
  • D. Monitor the latency of the endpoint to determine whether predictions are being served within the expected time frame.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
qaz09
4 months ago
Selected Answer: A
Since the performance dropped after a few weeks, the solution is to monitor training/serving skew
upvoted 3 times
...
kirukkuman
4 months, 1 week ago
Selected Answer: A
A significant drop in AUC means the patterns the model learned are no longer effective on the new, live data. Detecting a skew or drift between the statistical properties of the training data and the serving data is the most direct way to confirm and diagnose this issue. Vertex AI Model Monitoring is designed specifically for this purpose.
upvoted 3 times
...

Topic 1 Question 326

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 326 discussion

You work at an organization that manages a popular payment app. You built a fraudulent transaction detection model by using scikit-learn and deployed it to a Vertex AI endpoint. The endpoint is currently using 1 e2-standard-2 machine with 2 vCPUs and 8 GB of memory. You discover that traffic on the gateway fluctuates to four times more than the endpoint's capacity. You need to address this issue by using the most cost-effective approach. What should you do?

  • A. Re-deploy the model with a TPU accelerator.
  • B. Change the machine type to e2-highcpu-32 with 32 vCPUs and 32 GB of memory.
  • C. Set up a monitoring job and an alert for CPU usage. If you receive an alert, scale the vCPUs as needed.
  • D. Increase the number of maximum replicas to 6 nodes, each with 1 e2-standard-2 machine.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Fer660
2 months, 1 week ago
Selected Answer: D
Not A: scikit-learn does not support TPU Not B: traffic is 4x oversubscribed, but here we are deploying 16x CPUs, which seems overkill. Not C: We already know there is a problem, no need to wait for additional warnings. Yes to D: a reasonable amount of scaling (~6x) given what we know.
upvoted 3 times
...
qaz09
4 months ago
Selected Answer: D
scikit not supported on TPU, so not A option D is the one with autoscaling
upvoted 3 times
...
ricardovazz
4 months, 1 week ago
Selected Answer: D
D, scale horizontally increasing replicas for fluctuating traffic
upvoted 1 times
...
kirukkuman
4 months, 1 week ago
Selected Answer: D
The correct answer is D. This approach uses horizontal autoscaling, which is the most cost-effective and efficient way to handle fluctuating traffic on a Vertex AI endpoint
upvoted 2 times
...

Topic 1 Question 327

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 327 discussion

You are developing an AI text generator that will be able to dynamically adapt its generated responses to mirror the writing style of the user and mimic famous authors if their style is detected. You have a large dataset of various authors' works, and you plan to host the model on a custom VM. You want to use the most effective model. What should you do?

  • A. Deploy Llama 3 from Model Garden, and use prompt engineering techniques.
  • B. Fine-tune a BERT-based model from TensorFlow Hub.
  • C. Fine-tune Llama 3 from Model Garden on Vertex AI Pipelines.
  • D. Use the Gemini 1.5 Flash foundational model to build the text generator.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
c797628
Highly Voted 2 months, 2 weeks ago
Selected Answer: C
Dont get distracted by gemini just bc this is gcp exam. Option D is not finetunable and the question said "custom, most effective model" The answer is C since Llama3 is finetunable
upvoted 5 times
...
OpenKnowledge
Most Recent 3 weeks, 5 days ago
Selected Answer: C
Fine tune the Llama model from the Model Garden using large dataset of author works you have
upvoted 1 times
...
dija123
1 month, 1 week ago
Selected Answer: D
The question asks for a generator that can dynamically adapt to a user's writing style. This means it needs to change its behavior on the fly, for each new user and each new request.
upvoted 1 times
dija123
1 month ago
I just found this answer today "Choose Llama 3 if your main priority is raw text-based performance, you plan to run the model locally on your own hardware, or you need strong performance on tasks like coding and reasoning" Thinking it makes C suits the question requests more than D.
upvoted 1 times
...
...
Rafa1312
1 month, 2 weeks ago
Selected Answer: C
So there needs to be some sort of fine tuning that is needed for such a problem. I would have chosen D, if that was the cuase. We can fine tune Gemini 15, but it does not say anything. Hence I am going to chose C. However this is GCP exam
upvoted 2 times
...

Topic 1 Question 328

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 328 discussion

You are a lead ML architect at a small company that is migrating from on-premises to Google Cloud. Your company has limited resources and expertise in cloud infrastructure. You want to serve your models from Google Cloud as soon as possible. You want to use a scalable, reliable, and cost-effective solution that requires no additional resources. What should you do?

  • A. Configure Compute Engine VMs to host your models.
  • B. Create a Cloud Run function to deploy your models as serverless functions.
  • C. Create a managed cluster on Google Kubernetes Engine (GKE), and deploy your models as containers.
  • D. Deploy your models on Vertex AI endpoints.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Fer660
2 months, 1 week ago
Selected Answer: D
Not A, B, C: All these require knowledge of cloud infrastructure, with C being the worst offender. D: lowest requirement for cloud infra knowledge.
upvoted 4 times
...
qaz09
4 months ago
Selected Answer: D
D is the one which requires no additional resources
upvoted 4 times
...

Topic 1 Question 329

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 329 discussion

You deployed a conversational application that uses a large language model (LLM). The application has 1,000 users. You collect user feedback about the verbosity and accuracy of the model 's responses. The user feedback indicates that the responses are factually correct but users want different levels of verbosity depending on the type of question. You want the model to return responses that are more consistent with users' expectations, and you want to use a scalable solution. What should you do?

  • A. Implement a keyword-based routing layer. If the user's input contains the words "detailed" or "description," return a verbose response. If the user's input contains the word "fact." re-prompt the language model to summarize the response and return a concise response.
  • B. Ask users to provide examples of responses with the appropriate verbosity as a list of question and answer pairs. Use this dataset to perform supervised fine tuning of the foundational model. Re-evaluate the verbosity of responses with the tuned model.
  • C. Ask users to indicate all scenarios where they expect concise responses versus verbose responses. Modify the application 's prompt to include these scenarios and their respective verbosity levels. Re-evaluate the verbosity of responses with updated prompts.
  • D. Experiment with other proprietary and open-source LLMs. Perform A/B testing by setting each model as your application's default model. Choose a model based on the results.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
dija123
1 month, 1 week ago
Selected Answer: A
agree with A
upvoted 1 times
dija123
1 month, 1 week ago
Thinking B (supervised fine tuning ) could be better Answer than A
upvoted 1 times
...
...
Fer660
2 months, 1 week ago
Selected Answer: A
Not B: Why would we assume that our users are willing to do all this work? Not C: Same thing -- we can't just ask users for all this, they don't work for us. Not D: This seems to require a whole lot of work. And doing A/B testing for multiple models based on 1000 users seems likely to reach conclusions that are not statistically sound. A: Not perfect, for sure, because we are creating a declarative layer and we might have to keep adding rules to detect the customers' expectations -- seems contrary to the spirit of AI, and also violates the spirit of scalability, but I guess this is the lesser evil here?
upvoted 4 times
dija123
1 month, 1 week ago
A is not scalable indeed!
upvoted 1 times
...
...

Topic 1 Question 330

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 330 discussion

You are using Vertex AI to manage your ML models and datasets. You recently updated one of your models. You want to track and compare the new version with the previous one and incorporate dataset versioning. What should you do?

  • A. Use Vertex AI TensorBoard to visualize the training metrics of the new model version, and use Data Catalog to manage dataset versioning.
  • B. Use Vertex AI Model Monitoring to monitor the performance of the new model version, and use Vertex AI Training to manage dataset versioning.
  • C. Use Vertex AI Experiments to track and compare model artifacts and versions, and use Vertex ML Metadata to manage dataset versioning.
  • D. Use Vertex AI Experiments to track and compare model artifacts and versions, and use Vertex AI managed datasets to manage dataset versioning.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
dija123
1 month, 1 week ago
Selected Answer: D
A key feature of Vertex AI managed dataset service is built-in versioning
upvoted 1 times
...
spradhan
3 months, 2 weeks ago
Selected Answer: D
Vertex AI managed dataset now has versioning
upvoted 2 times
...
alja12
4 months, 1 week ago
Selected Answer: C
Vertex AI Managed Datasets don’t offer dataset versioning or lineage tracking hence C
upvoted 2 times
dija123
1 month, 1 week ago
Managed Datasets is offering versioning
upvoted 1 times
...
...

Topic 1 Question 331

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 331 discussion

You are creating a retraining policy for a customer churn prediction model deployed in Vertex AI. New training data is added weekly. You want to implement a model retraining process that minimizes cost and effort. What should you do?

  • A. Retrain the model when a significant shift in the distribution of customer attributes is detected in the production data compared to the training data.
  • B. Retrain the model when the model's latency increases by 10% due to increased traffic.
  • C. Retrain the model when the model accuracy drops by 10% on the new training dataset.
  • D. Retrain the model every week when new training data is available.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Rafa1312
1 month, 2 weeks ago
Selected Answer: A
I will ping A
upvoted 1 times
...
Fer660
2 months, 1 week ago
Selected Answer: A
A is better than C. The distribution-based triggers might catch some skew that accuracy alone has a hard time detecting (e.g. with under-represented classes).
upvoted 1 times
...
qaz09
4 months ago
Selected Answer: A
A - retrains model only when needed B - latency is not the parameter for retraining C - could be, but I choose A over C D - weekly retraining might be to frequent. If we want to minimize cost, it is better to retrain when there is a visible need for it
upvoted 1 times
...

Topic 1 Question 332

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 332 discussion

You are an AI engineer with an apparel retail company. The sales team has observed seasonal sales patterns over the past 5-6 years. The sales team analyzes and visualizes the weekly sales data stored in CSV files. You have been asked to estimate weekly sales for future seasons to optimize inventory and personnel workloads. You want to use the most efficient approach. What should you do?

  • A. Upload the files into Cloud Storage. Use Python to preprocess and load the tabular data into BigQuery. Use time series forecasting models to predict weekly sales.
  • B. Upload the files into Cloud Storage. Use Python to preprocess and load the tabular data into BigQuery. Train a logistic regression model by using BigQuery ML to predict each product's weekly sales as one of three categories: high, medium, or low.
  • C. Load the files into BigQuery. Preprocess data by using BigQuery SQL. Connect BigQuery to Looker. Create a Looker dashboard that shows weekly sales trends in real time and can slice and dice the data based on relevant filters.
  • D. Create a custom conversational application using Vertex AI Agent Builder. Include code that enables file upload functionality, and upload the files. Use few-shot prompting and retrieval-augmented generation (RAG) to predict future sales trends by using the Gemini large language model (LLM).
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
qaz09
4 months ago
Selected Answer: A
A. -> timeseries forecasting model is the one needed for this use case B. ->ARIMA/time series forecasting model will be better than logistic regression model for this use case C. -> Nothing about predicting values, it is just about visualising data. D. -> Too complicated solution, there is no need for Agent here.
upvoted 2 times
...

Topic 1 Question 333

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 333 discussion

Your company's business stakeholders want to understand the factors driving customer churn to inform their business strategy. You need to build a customer churn prediction model that prioritizes simple interpretability of your model's results. You need to choose the ML framework and modeling technique that will explain which features led to the prediction. What should you do?

  • A. Build a TensorFlow deep neural network (DNN) model, and use SHAP values for feature importance analysis.
  • B. Build a PyTorch long short-term memory (LSTM) network, and use attention mechanisms for interpretability.
  • C. Build a logistic regression model in scikit-learn, and interpret the model's output coefficients to understand feature impact.
  • D. Build a linear regression model in scikit-learn, and interpret the model's standardized coefficients to understand feature impact.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
OpenKnowledge
3 weeks, 5 days ago
Selected Answer: C
Linear Regression and Logistic Regression using Scikit-learn are inherently interpretable. The relationship between features and the target variable is directly represented by coefficients. The use case described in this question is classification (Churned and Not Churned) problem. So Logistic regression is the choice here which leads to Option CC
upvoted 1 times
...
Fer660
2 months, 1 week ago
Selected Answer: C
Not A: does not prioritize simple interpretability of your model's results Not B: does not prioritize simple interpretability of your model's results C: We need logistic regression because we have two classes (churned/not churned) Not D: linear regression will not help much when the labels are binary (churned/not churned)
upvoted 2 times
...

Topic 1 Question 334

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 334 discussion

You are responsible for managing and monitoring a Vertex AI model that is deployed in production. You want to automatically retrain the model when its performance deteriorates. What should you do?

  • A. Create a Vertex AI Model Monitoring job to track the model's performance with production data, and trigger retraining when specific metrics drop below predefined thresholds.
  • B. Collect feedback from end users, and retrain the model based on their assessment of its performance.
  • C. Configure a scheduled job to evaluate the model's performance on a static dataset, and retrain the model if the performance drops below predefined thresholds.
  • D. Use Vertex Explainable AI to analyze feature attributions and identify potential biases in the model. Retrain when significant shifts in feature importance or biases are detected.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
qaz09
4 months ago
Selected Answer: A
Only A addresses monitoring of performance and retraining based on this metric
upvoted 1 times
...
alja12
4 months, 1 week ago
Selected Answer: A
Question is about the performance of the model
upvoted 1 times
...

Topic 1 Question 335

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 335 discussion

You have recently developed a new ML model in a Jupyter notebook. You want to establish a reliable and repeatable model training process that tracks the versions and lineage of your model artifacts. You plan to retrain your model weekly. How should you operationalize your training process?

  • A. 1. Create an instance of the CustomTrainingJob class with the Vertex AI SDK to train your model.
    2. Using the Notebooks API, create a scheduled execution to run the training code weekly.
  • B. 1. Create an instance of the CustomJob class with the Vertex AI SDK to train your model.
    2. Use the Metadata API to register your model as a model artifact.
    3. Using the Notebooks API, create a scheduled execution to run the training code weekly.
  • C. 1. Create a managed pipeline in Vertex AI Pipelines to train your model by using a Vertex AI CustomTrainingJobOp component.
    2. Use the ModelUploadOp component to upload your model to Vertex AI Model Registry.
    3. Use Cloud Scheduler and Cloud Run functions to run the Vertex AI pipeline weekly.
  • D. 1. Create a managed pipeline in Vertex AI Pipelines to train your model using a Vertex AI HyperParameterTuningJobRunOp component.
    2. Use the ModelUploadOp component to upload your model to Vertex AI Model Registry.
    3. Use Cloud Scheduler and Cloud Run functions to run the Vertex AI pipeline weekly.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
OpenKnowledge
3 weeks, 5 days ago
Selected Answer: B
The Vertex AI Metadata API, specifically known as Vertex ML Metadata, is a service within Google Cloud's Vertex AI platform designed to track, manage, and analyze the metadata and artifacts generated throughout the machine learning (ML) lifecycle.
upvoted 1 times
...
dija123
1 month ago
Selected Answer: C
C directly addresses all the requirements: reliability, repeatability, versioning, lineage, and scheduling.
upvoted 1 times
...

Topic 1 Question 336

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 336 discussion

You have developed a custom ML model using Vertex AI and want to deploy it for online serving. You need to optimize the model's serving performance by ensuring that the model can handle high throughput while minimizing latency. You want to use the simplest solution. What should you do?

  • A. Deploy the model to a Vertex AI endpoint resource to automatically scale the serving backend based on the throughput. Configure the endpoint's autoscaling settings to minimize latency.
  • B. Implement a containerized serving solution using Cloud Run. Configure the concurrency settings to handle multiple requests simultaneously.
  • C. Apply simplification techniques such as model pruning and quantization to reduce the model's size and complexity. Retrain the model using Vertex AI to improve its performance, latency, memory, and throughput.
  • D. Enable request-response logging for the model hosted in Vertex AI. Use Looker Studio to analyze the logs, identify bottlenecks, and optimize the model accordingly.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Fer660
2 months, 1 week ago
Selected Answer: A
This is pretty much the textbook use case for Vertex AI endpoints, answer A.
upvoted 4 times
...

Topic 1 Question 337

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 337 discussion

Your company needs to generate product summaries for vendors. You evaluate a foundation model from Model Garden for text summarization and find the style of the summaries are not aligned with your company's brand voice. How should you improve this LLM-based summarization model to better meet your business objectives?

  • A. Replace the pre-trained model with another model in Model Garden.
  • B. Fine-tune the model using a company-specific dataset.
  • C. Increase the model's temperature parameter.
  • D. Tune the token output limit in the response.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
OpenKnowledge
3 weeks, 5 days ago
Selected Answer: B
Model Garden in Vertex AI is a centralized platform for discovering, customizing, and deploying a wide variety of AI models from Google, its partners, and open-source communities. The Model Garden provides tools to tune models using your own data to adapt them to specific business needs.
upvoted 1 times
...
Fer660
2 months, 1 week ago
Selected Answer: B
Not A: unlikely that any off-the-shelf model will match your specific "voice". B: A bit of fine-tuning seems inevitable. Not C: will not address the issue of voice. Not D: willnot address the issue of voice.
upvoted 2 times
...
qaz09
4 months ago
Selected Answer: B
B -> we feed the input with some examples which will suit our "company voice" A -> does not address issue directly C-> temperature parameter controls degree of randomness in the output (low temperature means less creative answer, high temp means more creative answer) D -> it just changes the number of characters in the output
upvoted 2 times
...

Topic 1 Question 338

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 338 discussion

You built a custom Vertex AI pipeline job that preprocesses images and trains an object detection model. The pipeline currently uses 1 n1-standard-8 machine with 1 NVIDIA Tesla V100 GPU. You want to reduce the model training time without compromising model accuracy. What should you do?

  • A. Reduce the number of layers in your object detection model.
  • B. Train the same model on a stratified subset of your dataset.
  • C. Update the WorkerPoolSpec to use a machine with 24 vCPUs and 1 NVIDIA Tesla V100 GPU.
  • D. Update the WorkerPoolSpec to use a machine with 24 vCPUs and 3 NVIDIA Tesla V100 GPUs.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
OpenKnowledge
3 weeks, 5 days ago
Selected Answer: D
A & B are not the options to satisfy the requirement. Looking at C & B, look the solution is indicating to reduction server strategy for model training. According to the reduction server strategy, the worker nodes (multiple nodes in worker pool) needs to be GPUs and the single reduction server (the node that combines all the results from the worker nodes to output cumulative results) needs to vCPUs. So, D is the answer
upvoted 1 times
...
dija123
1 month, 1 week ago
Selected Answer: D
3 NVIDIA Tesla V100 GPUs will reduce the training time as requested in the question.
upvoted 1 times
...
3320630
2 months, 2 weeks ago
Selected Answer: D
If you do not want to reduce the model accuracy, you need to use more compute. And as the model is trained on the GPU (using the CPU mainly for data handling) only increasing the CPU Core count would not help. So you need to use more GPU performance.
upvoted 3 times
...

Topic 1 Question 339

exam questions

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 339 discussion

You are a SQL analyst. You need to utilize a TensorFlow customer segmentation model stored In Cloud Storage. You want to use the simplest and most efficient approach. What should you do?

  • A. Import the model into Vertex AI Model Registry. Deploy the model to a Vertex AI endpoint, and use SQL for inference in BigQuery.
  • B. Deploy the model by using TensorFlow Serving, and call for inference from BigQuery.
  • C. Convert the model into a BigQuery ML model, and use SQL for inference.
  • D. Import the model into BigQuery, and use SQL for inference.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment ( ? ) . It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
OpenKnowledge
3 weeks, 5 days ago
Selected Answer: C
In Option C, convert model into BigQuery ML is basically IMPORTING the model into BigQuery ML using CREATE MODEL command. There is NOTHING called importing model into "BigQuery". So D is not an option. Option A & B are not efficient approach.
upvoted 1 times
...
c797628
2 months, 2 weeks ago
Selected Answer: C
Answer is C bc BigQuery does not support direct model imports unless through BigQuery ML
upvoted 2 times
el_vampiro
2 months ago
D is correct. BQ can import TF model - https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create-tensorflow
upvoted 2 times
...
...
spradhan
3 months, 2 weeks ago
Selected Answer: D
D is correct. A is not cause why should you deploy a tensorflow model. Use it directly in big query.
upvoted 1 times
...
ricardovazz
4 months, 1 week ago
Selected Answer: D
BigQuery ML supports importing TensorFlow models in SavedModel format directly from Cloud Storage. D
upvoted 3 times
...